[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Shadow processes not ending



I've noticed that often, after job termination the condor_shadow processes hang around even though the jobs they were shadowing finished hours previously. My ShadowLog has lots of the following:

12/7 04:36:12 (173.16) (5050): Job 173.16 terminated: exited with status 0
12/7 04:36:12 (173.16) (5050): FileLock::obtain(1) failed - errno 37 (No locks available) 12/7 04:36:12 (173.16) (5050): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 100

Similar FileLock errors often appear immediately after the job log shows that a job begins to execute, and they also seem to coincide with many other events in the job log - for example, when the image size is updated, or the job is evicted, there is often a FileLock error in the ShadowLog at the same time. Any idea what's going on?

Adam