[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Vanilla - jobs disappear without completing.

Dear all,
Do you know if there is there a file size limit for condor runs? If so, is there a line that can be added to submit files to increase this? (Something to do with "ImageSize"?). Or, perhaps I've missed the mark completely?
Our pool is setup to not allow preempting as we are in the vanilla universe without the ability to compile with condor-specific libraries.
Recently I've been seeing a few occurrences of a problem whereby some jobs that were running seem to be kicked off their current processor and then either disappear from the queue, stay in a permanent state of "H" or its still reported as running but all but one or two files have been deleted from the /execute/dir[xxxx] directory.
The jobs haven't successfully completed, output isn't copied back to its original location and there doesn't appear to be any log output to give me a clue.
The only thing that seems to be common between the failures at the moment is that the jobs have all been running for more than 4 or 5 days and all were taking up near, or in excess of 2GB of space in the execute directory: ./execute/dir[xxxx].
Does anyone have any ideas? Or any advice on how to increase logging so that I can catch what ever is happening?
Many thanks to everyone for reading,
Rob Stevenson - Systems Administrator
IS Services

HR Wallingford Ltd
Howbery Park, Wallingford, Oxfordshire OX10 8BA, United Kingdom
e: r.stevenson@xxxxxxxxxxxxxxxxxxx
t: +44 (0) 1491 822472 (direct), +44 (0) 1491 835381 (switchboard)
f: +44 (0) 1491 825483 (direct), +44 (0) 1491 832233 (general)

HR Wallingford uses Faxes and Emails for confidential and
legally privileged business communications. They do not of
themselves create legal commitments. Disclosure to parties
other than addressees requires our specific consent. We are
not liable for unauthorised disclosures nor reliance upon them.
If you have received this message in error please advise us
immediately and destroy all copies of it.

HR Wallingford Limited
Howbery Park, Wallingford, Oxon, OX10 8BA, UK
Registered in England No. 02562099