[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Some jobs held on HTcondor 8.0



We are in the process of upgrading to HTcondor 8.0, and getting intermittant errors where condor can't open output files causing the jobs to be held.

The scenario:

1. Submit a group jobs from a single submit file.
2. Some jobs run and others are immediately held.
3. The held jobs report errors like:

007 (602.000.000) 07/10 07:59:46 Shadow exception!
Error from slot1@xxxxxxxxxxxxxxxxxx: Failed to open ‘/home/user/cond_result/newfast-MR2/output-243.txt’ as standard output: No such file or directory (errno 2)
Code 7 Subcode 2

I've already checked the relevant permissions, which seems an unlikely source since most of the jobs in the batch run fine.

Perhaps a config issue?
Some type of race?

Thanks
RP