Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor jobs get matched, then released immediately

Date: Mon, 25 Jun 2007 14:50:11 -0500
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [Condor-users] Condor jobs get matched, then released immediately

Take a look in the ShadowLog on the submit machine or in the StarterLogon the execute machine --- perhaps grep -i for "error".

One guessesis something the job needs immediately at startup ismissing, such as the specified initial working direction or stdin fileis missing. Condor (in v6.8.x) will automatically try to restart thejob, just in case the missing files or directories are on a file serverthat is temporarily down. In v6.9.x, several errors of this sort willresult in the job being retried a couple times and then placed on hold(with a hold reason).


Hope this helps,
Todd




Ngwa Godlove wrote:

Hi,
I’m new to condor,
recently installed 6.8.5 on a new pool with 4 nodes, 1 pool manager and1 submitter. Every time I submit a job, my condor_status shows all nodesas claimed, and then they all immediately get switched back tounclaimed. Condor_reschedule does the same thing with the nodes goingfrom claimed to unclaimed.
I’m tempted to think the origin of my problems is my condorconfiguration. Below is part of the startLog from one of my nodes. Cananyone tell what is wrong from this log? Any ideas are greatly appreciated.
6/25 14:39:20 DaemonCore: Command received via TCP from host<X.X.X.125:3844>
6/25 14:39:20 DaemonCore: received command 442 (REQUEST_CLAIM), callinghandler (command_request_claim)
6/25 14:39:20 vm1: Request accepted.

6/25 14:39:20 vm1: Remote owner is BBBBBBB

6/25 14:39:20 vm1: State change: claiming protocol successful

6/25 14:39:20 vm1: Changing state: Unclaimed -> Claimed
6/25 14:39:28 DaemonCore: Command received via TCP from host<X.X.X.125:3876>
6/25 14:39:28 DaemonCore: received command 444 (ACTIVATE_CLAIM), callinghandler (command_activate_claim)
6/25 14:39:28 vm1: Got activate_claim request from shadow(<10.0.0.125:3876>)
6/25 14:39:28 vm1: Remote job ID is 11.3

6/25 14:39:28 vm1: Got universe "VANILLA" (5) from request classad

6/25 14:39:28 vm1: State change: claim-activation protocol successful

6/25 14:39:28 vm1: Changing activity: Idle -> Busy
6/25 14:39:34 DaemonCore: Command received via TCP from host<X.X.X.125:3901>
6/25 14:39:34 DaemonCore: received command 404(DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler)
6/25 14:39:34 vm1: Called deactivate_claim_forcibly()
6/25 14:39:34 DaemonCore: Command received via UDP from host<X.X.X.125:3904>
6/25 14:39:34 DaemonCore: received command 443 (RELEASE_CLAIM), callinghandler (command_release_claim)
6/25 14:39:34 vm1: State change: received RELEASE_CLAIM command
6/25 14:39:34 vm1: Changing state and activity: Claimed/Busy ->Preempting/Vacating
6/25 14:39:34 DaemonCore: Command received via UDP from host<X.X.X.125:3905>
6/25 14:39:34 DaemonCore: received command 443 (RELEASE_CLAIM), callinghandler (command_release_claim)
6/25 14:39:34 vm1: Got RELEASE_CLAIM while in Preempting state, ignoring.
6/25 14:39:34 DaemonCore: Command received via UDP from host<X.X.X.123:3738>
6/25 14:39:34 DaemonCore: received command 60011 (DC_NOP), callinghandler (handle_nop())
6/25 14:39:34 Starter pid 2508 exited with status 0

6/25 14:39:34 vm1: State change: starter exited

6/25 14:39:34 vm1: State change: No preempting claim, returning to owner
6/25 14:39:34 vm1: Changing state and activity: Preempting/Vacating ->Owner/Idle
6/25 14:39:34 vm1: State change: IS_OWNER is false

6/25 14:39:34 vm1: Changing state: Owner -> Unclaimed
** Godlove Ntumngia **

** Axis GeoSpatial LLC **
------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:https://lists.cs.wisc.edu/archive/condor-users/

Follow-Ups:
- Re: [Condor-users] Condor jobs get matched, then released immediately
  - From: Ngwa Godlove

References:
- [Condor-users] Condor jobs get matched, then released immediately
  - From: Ngwa Godlove

Prev by Date: [Condor-users] Condor jobs get matched, then released immediately
Next by Date: Re: [Condor-users] Condor jobs get matched, then released immediately
Previous by thread: [Condor-users] Condor jobs get matched, then released immediately
Next by thread: Re: [Condor-users] Condor jobs get matched, then released immediately
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Condor jobs get matched, then released immediately