[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] job eviction and image size




matthew hope wrote:


The problem is not normally that the job is evicted due to its image
size* but that it goes beyound the limit for the machine and then is
preempted for another reason.

At this point the imagesize has been updated to report the real usage
but when compared against the startd's reported memory it is not
compatible.

Normally this is not an issue if the startd is correctly reporting the
right amount of memory but there is a bug which causes it to report
too little for SMP machines in windows at least. I believe this is
fixed in the dev release but can't remember if it in the 6.6 series.

The only easy way to work round this is:

1) lie in the startd (artificially report more memory than exists)
2) lie in the job ClassAd
(when a job cannot run condor_qedit it's ImageSize to be low again.)


Yet another way is to explicitly specify your own memory requirements in your submit file. This prevents the auto-generated ImageSize requirements, so you don't have to worry about changes in ImageSize causing rescheduling problems if the job is preempted. Example:


requirements = Memory >= 249

--Dan