[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Why was my job evicted?



Hello All,

Was running a job that was evicted. In the past I've always been able
to find out what happened, but this time, not so much.

Execute Node:

StartLog:

4/20 15:07:41 Got SIGTERM. Performing graceful shutdown.
4/20 15:07:41 shutdown graceful
4/20 15:07:41 Changing activity: Busy -> Retiring
4/20 15:07:41 State change: claim retirement ended/expired
4/20 15:07:41 Changing state and activity: Claimed/Retiring ->
Preempting/Vacating
4/20 15:07:49 Got KILL_FRGN_JOB while in Preempting state, ignoring.
4/20 15:07:49 Got RELEASE_CLAIM while in Preempting state, ignoring.
4/20 15:07:49 Starter pid 1508 exited with status 0

StarterLog:
4/20 15:07:41 Got SIGTERM. Performing graceful shutdown.
4/20 15:07:41 ShutdownGraceful all jobs.
4/20 15:07:41 Process exited, pid=1300, status=-1073741510
4/20 15:07:49 Last process exited, now Starter is exiting

MasterLog
4/20 15:07:41 Got SIGTERM. Performing graceful shutdown.
4/20 15:07:41 ShutdownGraceful all jobs.
4/20 15:07:41 Process exited, pid=1300, status=-1073741510
4/20 15:07:49 Last process exited, now Starter is exiting


So far it seems like the request came from outside, but on the
Schedd\Shadow end of things...

ShadowLog

4/20 15:07:44 (17040.0) (1556): Job 17040.0 is being evicted from ha2003x86Exec
4/20 15:07:44 (17040.0) (1556): Job 17040.0 is being evicted from ha2003x86Exec
4/20 15:07:44 (17040.0) (1556): **** condor_shadow (condor_SHADOW) pid
1556 EXITING WITH STATUS 107
4/20 15:07:57 Initializing a VANILLA shadow for job 17040.0
4/20 15:07:57 (17040.0) (3988): init_user_ids: failed because user
switching is disabled

ScheddLog

4/20 15:07:44 (pid:2572) Shadow pid 1556 for job 17040.0 exited with status 107
4/20 15:07:44 (pid:2572) Match record (ha2003x86Exec
<10.127.250.34:1045> for adam_smola@hydra, 17040.0) deleted



Any idea as to what happened?


-Adam