[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] jobs vacating reason



OK, maybe this is the issue... on the same slot I get messages like this around the time the job was vacated:


12/09/10 13:06:04 attempt to connect to <127.0.0.1:39905> failed: Connection refused (connect errno = 111).
12/09/10 13:06:04 slot4: Failed to connect to schedd <127.0.0.1:39905>
12/09/10 13:06:09 slot4: State change: claim lease expired (condor_schedd gone?)
12/09/10 13:06:09 slot4: Changing state and activity: Claimed/Busy -> Preempting/Killing
12/09/10 13:06:09 slot4: Got KILL_FRGN_JOB while in Preempting state, ignoring.
12/09/10 13:06:09 Starter pid 11281 exited with status 0
12/09/10 13:06:09 slot4: State change: starter exited
12/09/10 13:06:09 slot4: State change: No preempting claim, returning to owner

schedd isn't even running on that machine.... it's got MASTER and STARTD only... (as it should), job was started from elsewhere (Ican verify the machine it started from).

12/09/10 12:46:09 slot4: match_info called
12/09/10 12:46:09 slot4: Got activate_claim request from shadow (<192.168.16.123:42331>)
12/09/10 12:46:09 slot4: Remote job ID is 481.0
12/09/10 12:46:10 slot4: Got universe "VANILLA" (5) from request classad
12/09/10 12:46:10 slot4: State change: claim-activation protocol successful
12/09/10 12:46:10 slot4: Changing activity: Idle -> Busy

On Thu, Dec 9, 2010 at 2:59 PM, Erik Aronesty <erik@xxxxxxx> wrote:
OK I tried everything you said... my jobs are still restarting every 20 minutes for no reason I can think of.

- Erik


On Thu, Dec 9, 2010 at 3:16 PM, Matthew Farrellee <matt@xxxxxxxxxx> wrote:
On 12/09/2010 02:59 PM, Erik Aronesty wrote:
OK I tried everything you said... my jobs are still restarting every 20
minutes for no reason I can think of.

- Erik

You should have a look at the StartLog and see what happens around the state changes you posted earlier.

Best,


matt