[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Evicted job won't restart on Windows XP due to wrong memory calculation



I've got a dedicated Condor 6.8.4 pool using a Windows 2003 Server as central manager and a bunch of Windows XP boxes as execute nodes. Now I noticed that one of my jobs got evicted (for some reason) and won't restart since

"No resources matched request's contraints:
Check the Requirements _expression_ below:

Requirements = [...] && (Arch == "INTEL") && (Disk >= DiskUsage) && ((Memory * 1024) >= ImageSize) && (HasFileTransfer)"

This seems to be due to the automatically inserted job requirement of "((Memory * 1024) >= ImageSize)".
However, I'm not sure I understand this - my job has an

ImageSize = 530000

while all machines' classads say

Memory = 511

Obviously the machine memory is calculated in megabytes (instead of kilobytes as stated in section 7.3 of the 6.8.4 manual) while the image size of the job is calculated in bytes - at least I can't see why my job could ever have an image size of 530 MB.

And by the way - why does an evicted vanilla job on Windows have an image size > 0 anyways???
Since there is no checkpointing on windows the job would start from scratch once it is rescheduled, wouldn't it?
And the final question: How do I get condor to restart my job? I need the job to be restarted instead of removed and resubmitted since we have built a little GUI that checks on the pool using the cluster IDs and now removing and resubmitting would leave the GUI lost...

Thanks for any help or clarification,


Thorsten



Yahoo! Messenger - kostenlos* mit Familie und Freunden von PC zu PC telefonieren .