[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Job runs from cmd line, but SIGQUITS after 40 mins under Condor



I read in the archives that v 6.6.x has a 2 GB memory limit, and will be fixed in 6.7.x. Can anyone confirm or deny? Is there a work around? This would explain the symptoms I am having below.

 

BTW - I loaded 6.7.10 on a submit-only box and an execution box, but it didn't help. Does the Master box need 6.7.10 as well?

 

Thanks,

Jim

 


From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Cox, James A (TRAC)
Sent: Tuesday, August 23, 2005 1:44 PM
To: 'condor-users@xxxxxxxxxxx'
Subject: [Condor-users] Job runs from cmd line, but SIGQUITS after 40 mins under Condor

 

The subject tells it all: I can run the job from the command line and it will go to completion (about 100 hrs), but when I submit it under Condor, it starts and runs for 40 mins (while it is mostly reading in data). Condor then gets a "SIGQUIT" and thinks it's done.

 

I suspect it is running out of memory under Condor, but it works from the command line because all the memory is available. I've tried reconfiguring the VMs, RAM available, etc. We even pumped up one box to 10GB RAM, so that each vm had 5 GB! No luck.

 

The boxes are dual processor, 64bit AMDs, running RHEL 4 and condor 6.6.10. with 4 GB.  The job, however, was compiled on a 32 bit box, since the compiler is currently only available in 32 bit.

 

Is there some reason Condor won't let the executable use the entire advertised space? Submitting 60 jobs from the command line every few days isn't fun, and it keeps us from efficiently using the farm.

 

Any ideas where else to troubleshoot?

 

Thanks,

Jim

 

James A. Cox
TRADOC Analysis Center
Security & Information Technology Division
White Sands Missile RangeNew Mexico 88002
office: 505-678-1822 cell: 505-430-4626