We are in the process of upgrading to HTcondor 8.0, and getting
intermittant errors where condor can't open output files causing the
jobs to be held.|
1. Submit a group jobs from a single submit file.
2. Some jobs run and others are immediately held.
3. The held jobs report errors like:
007 (602.000.000) 07/10 07:59:46 Shadow exception!
Error from slot1@xxxxxxxxxxxxxxxxxx: Failed to open ‘/home/user/cond_result/newfast-MR2/output-243.txt’ as standard output: No such file or directory (errno 2)
Code 7 Subcode 2
I've already checked the relevant permissions, which seems an unlikely source since most of the jobs in the batch run fine.
Perhaps a config issue?
Some type of race?