We are in the process of upgrading to HTcondor 8.0, and getting
intermittant errors where condor can't open output files causing the
jobs to be held. The scenario: 1. Submit a group jobs from a single submit file. 2. Some jobs run and others are immediately held. 3. The held jobs report errors like: 007 (602.000.000) 07/10 07:59:46 Shadow exception!
Error from slot1@xxxxxxxxxxxxxxxxxx:
Failed to open ‘/home/user/cond_result/newfast-MR2/output-243.txt’
as standard output: No such file or directory (errno 2)
Code 7 Subcode 2
I've already checked the relevant permissions, which seems an unlikely source since most of the jobs in the batch run fine. Perhaps a config issue? Some type of race? Thanks RP |