[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] ImageSize Problems (+ documentation typo)



Dear Condor users, 

I'm currently running into difficulties with jobs running on our local OSX 10.5 Condor 7.6.0 grid.

I found the FAQ "Why does my Linux job have an enormous ImageSize and refuse to run anymore?" http://www.cs.wisc.edu/condor/manual/v7.6/7_3Running_Condor.html#SECTION008310000000000000000
which appeared to address my issue, but unfortunately my jobs are still being held with

SYSTEM_PERIODIC_HOLD expression '(JobStatus == 2) && (ImageSize > 3048000)' evaluated to TRUE

The documentation suggests setting a ClassAd as below:

Requirements            = (Memory > 250)

However, this routinely results in the job being held after the SYSTEM_PERIODIC_HOLD on the first
SYSTEM_PERIODIC_HOLD cycle (after 5 minutes). 

condor_q -l tells me that

ImageSize 	= 4750000
ImageSize_RAW 	= 4561220

so I understand why the job is being held, but not why setting Requirements would help or correct ImageSize. 

I've spent some time monitoring my job, and though there are two processes (the shell script and then the actual program that it launches), the resident memory never exceeds 210 megabytes.  The virtual memory is about 1.2 GB.

It appears the hold must be due to the (ImageSize > 3048000) statement (hold jobs which are using > 3GB), however I am fairly certain that my job is not, and I can't seem to find any way around this.

Can anyone suggest a way to get these jobs to dodge SYSTEM_PERIODIC_HOLD when they're actual resident memory (observed through top) is closer to 210 megabytes - I'm not keen on adjusting the HOLD expression as I don't want other errant jobs thrashing swap.

Thanks in advance for any suggestions on how to debug this further or any possible fixes.

Regards,

Dan

PS (Typo: where the documentation reads "You will need to change 1024 to a reasonably good estimate of the actual image size of your program, in kilobytes" I think that should be megabytes - as earlier we are told that Memory is measured in megabytes).

Dan O'Donovan Ph.D
SBGrid Consortium
Harvard Medical School