[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor jobs get matched, then released immediately



Hi Todd,

This is what the shadow Log is indicating. I'm not sure what the
"OnExitRemove expression" means. Is that an attribute that needs to be
specified in my application or within condor? See below:

6/25 14:39:28 DaemonCore: Command Socket at <X.X.X.125:3873>
6/25 14:39:28 Initializing a VANILLA shadow for job 11.3
6/25 14:39:28 (11.3) (3556): Request to run on <X.X.0.123:3545> was
ACCEPTED
6/25 14:39:28 (11.0) (3180): Job 11.0 is being put back in the job
queue: The job attribute OnExitRemove expression '(ExitBySignal ==
FALSE) && (ExitCode <= 0)' evaluated to FALSE
6/25 14:39:28 (11.0) (3180): **** condor_shadow (condor_SHADOW) EXITING
WITH STATUS 107
6/25 14:39:29 (11.1) (2600): Job 11.1 is being put back in the job
queue: The job attribute OnExitRemove expression '(ExitBySignal ==
FALSE) && (ExitCode <= 0)' evaluated to FALSE
6/25 14:39:29 (11.1) (2600): **** condor_shadow (condor_SHADOW) EXITING
WITH STATUS 107
6/25 14:39:32 (11.2) (352): Job 11.2 is being put back in the job queue:
The job attribute OnExitRemove expression '(ExitBySignal == FALSE) &&
(ExitCode <= 0)' evaluated to FALSE
6/25 14:39:32 (11.2) (352): **** condor_shadow (condor_SHADOW) EXITING
WITH STATUS 107
6/25 14:39:34 (11.3) (3556): Job 11.3 is being put back in the job
queue: The job attribute OnExitRemove expression '(ExitBySignal ==
FALSE) && (ExitCode <= 0)' evaluated to FALSE
6/25 14:39:34 (11.3) (3556): **** condor_shadow (condor_SHADOW) EXITING
WITH STATUS 107



-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Monday, June 25, 2007 3:50 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Condor jobs get matched,then released
immediately


Take a look in the ShadowLog on the submit machine or in the StarterLog 
on the execute machine --- perhaps grep -i for "error".

One guessesis  something the job needs immediately at startup is 
missing, such as the specified initial working direction or stdin file 
is missing.  Condor (in v6.8.x) will automatically try to restart the 
job, just in case the missing files or directories are on a file server 
that is temporarily down.  In v6.9.x, several errors of this sort will 
result in the job being retried a couple times and then placed on hold 
(with a hold reason).

Hope this helps,
Todd




Ngwa Godlove wrote:
> 
> 
> Hi,
> 
>  
> 
> I'm new to condor,
> 
> recently installed 6.8.5 on a new pool with 4 nodes, 1 pool manager
and 
> 1 submitter. Every time I submit a job, my condor_status shows all
nodes 
> as claimed, and then they all immediately get switched back to 
> unclaimed. Condor_reschedule does the same thing with the nodes going 
> from claimed to unclaimed.
> 
> I'm tempted to think the origin of my problems is my condor 
> configuration. Below is part of the startLog from one of my nodes. Can

> anyone tell what is wrong from this log? Any ideas are greatly
appreciated.
> 
>  
> 
>  
> 
>  
> 
> 6/25 14:39:20 DaemonCore: Command received via TCP from host 
> <X.X.X.125:3844>
> 
> 6/25 14:39:20 DaemonCore: received command 442 (REQUEST_CLAIM),
calling 
> handler (command_request_claim)
> 
> 6/25 14:39:20 vm1: Request accepted.
> 
> 6/25 14:39:20 vm1: Remote owner is BBBBBBB
> 
> 6/25 14:39:20 vm1: State change: claiming protocol successful
> 
> 6/25 14:39:20 vm1: Changing state: Unclaimed -> Claimed
> 
> 6/25 14:39:28 DaemonCore: Command received via TCP from host 
> <X.X.X.125:3876>
> 
> 6/25 14:39:28 DaemonCore: received command 444 (ACTIVATE_CLAIM),
calling 
> handler (command_activate_claim)
> 
> 6/25 14:39:28 vm1: Got activate_claim request from shadow 
> (<10.0.0.125:3876>)
> 
> 6/25 14:39:28 vm1: Remote job ID is 11.3
> 
> 6/25 14:39:28 vm1: Got universe "VANILLA" (5) from request classad
> 
> 6/25 14:39:28 vm1: State change: claim-activation protocol successful
> 
> 6/25 14:39:28 vm1: Changing activity: Idle -> Busy
> 
> 6/25 14:39:34 DaemonCore: Command received via TCP from host 
> <X.X.X.125:3901>
> 
> 6/25 14:39:34 DaemonCore: received command 404 
> (DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler)
> 
> 6/25 14:39:34 vm1: Called deactivate_claim_forcibly()
> 
> 6/25 14:39:34 DaemonCore: Command received via UDP from host 
> <X.X.X.125:3904>
> 
> 6/25 14:39:34 DaemonCore: received command 443 (RELEASE_CLAIM),
calling 
> handler (command_release_claim)
> 
> 6/25 14:39:34 vm1: State change: received RELEASE_CLAIM command
> 
> 6/25 14:39:34 vm1: Changing state and activity: Claimed/Busy -> 
> Preempting/Vacating
> 
> 6/25 14:39:34 DaemonCore: Command received via UDP from host 
> <X.X.X.125:3905>
> 
> 6/25 14:39:34 DaemonCore: received command 443 (RELEASE_CLAIM),
calling 
> handler (command_release_claim)
> 
> 6/25 14:39:34 vm1: Got RELEASE_CLAIM while in Preempting state,
ignoring.
> 
> 6/25 14:39:34 DaemonCore: Command received via UDP from host 
> <X.X.X.123:3738>
> 
> 6/25 14:39:34 DaemonCore: received command 60011 (DC_NOP), calling 
> handler (handle_nop())
> 
> 6/25 14:39:34 Starter pid 2508 exited with status 0
> 
> 6/25 14:39:34 vm1: State change: starter exited
> 
> 6/25 14:39:34 vm1: State change: No preempting claim, returning to
owner
> 
> 6/25 14:39:34 vm1: Changing state and activity: Preempting/Vacating ->

> Owner/Idle
> 
> 6/25 14:39:34 vm1: State change: IS_OWNER is false
> 
> 6/25 14:39:34 vm1: Changing state: Owner -> Unclaimed
> 
>  
> 
> ** Godlove Ntumngia **
> 
> ** Axis GeoSpatial LLC **
> 
>  
> 
> 
>
------------------------------------------------------------------------
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/