[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_shadow (condor_SHADOW) EXITING WITH STATUS 107



Notable messages in the StarterLog seem to be "condor_schedd gone?", "starter exited", and "can't find resource with ClaimId". I can confirm that condor_schedd is running as has been for some time:

condor 20700 0.0 0.0 6136 1620 ? Ss Feb10 2:04 condor_schedd -f

The StartLog for the respective vm shows nothing more than:
6/10 12:25:20 Got SIGQUIT.  Performing fast shutdown

StarterLog:
----------------------------------------------------------------
6/10 12:25:20 vm2: State change: claim lease expired (condor_schedd gone?)
6/10 12:25:20 vm2: Changing state and activity: Claimed/Busy -> Preempting/Killing
6/10 12:25:20 vm1: State change: claim lease expired (condor_schedd gone?)
6/10 12:25:20 vm1: Changing state and activity: Claimed/Busy -> Preempting/Killing
6/10 12:25:20 Starter pid 17854 exited with status 0
6/10 12:25:20 vm2: State change: starter exited
6/10 12:25:20 vm2: State change: No preempting claim, returning to owner
6/10 12:25:20 vm2: Changing state and activity: Preempting/Killing -> Owner/Idle
6/10 12:25:20 vm2: State change: IS_OWNER is false
6/10 12:25:20 vm2: Changing state: Owner -> Unclaimed
6/10 12:25:20 Starter pid 17898 exited with status 0
6/10 12:25:20 vm1: State change: starter exited
6/10 12:25:20 vm1: State change: No preempting claim, returning to owner
6/10 12:25:20 vm1: Changing state and activity: Preempting/Killing -> Owner/Idle
6/10 12:25:20 vm1: State change: IS_OWNER is false
6/10 12:25:20 vm1: Changing state: Owner -> Unclaimed
6/10 12:25:21 DaemonCore: Command received via TCP from host <192.168.1.12:41455>
6/10 12:25:21 DaemonCore: received command 404 (DEACTIVATE_CLAIM_FORCIBLY), calling handle
r (command_handler)
6/10 12:25:21 Error: can't find resource with ClaimId (<192.168.1.25:32773>#1108063081#161
3)
6/10 12:25:21 DaemonCore: Command received via TCP from host <192.168.1.12:41456>
6/10 12:25:21 DaemonCore: received command 404 (DEACTIVATE_CLAIM_FORCIBLY), calling handle
r (command_handler)
6/10 12:25:21 Error: can't find resource with ClaimId (<192.168.1.25:32773>#1108063081#161
1)
6/10 12:25:22 DaemonCore: Command received via UDP from host <192.168.1.12:49453>
6/10 12:25:22 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler (command_release_claim)


-Jacob

Erik Paulson wrote:
On Fri, Jun 10, 2005 at 12:57:03PM -0400, Jacob Joseph wrote:

Can anyone provide any insight into this error? Most of the jobs in the cluster are being evicted more than once. I can't find any documentation for condor_shadow. Note that KILL and PREEMPT are False, so jobs should never stop once started.



Check the StartLog and the StarterLog on 192.168.1.25.

-Erik


./lop1/log/ShadowLog:6/10 12:13:22 (26347.0) (25897): Request to run on <192.168.1.25:32773> was ACCEPTED
./lop1/log/ShadowLog:6/10 12:25:20 (26347.0) (25897): Job 26347.0 is being evicted
./lop1/log/ShadowLog:6/10 12:25:22 (26347.0) (25897): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 107


-Jacob
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

_______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx https://lists.cs.wisc.edu/mailman/listinfo/condor-users