[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] machine stops executing jobs



In your StartLog...

> 01/21 11:50:38 slot1: Got activate_claim request from shadow
> (<128.112.146.182:41584>)
> 01/21 11:50:38 slot1: Job Requirements check failed!
> 01/21 11:50:38 slot1: Called deactivate_claim_forcibly()

Job Requirements check failed means that for some reason the job decided it did not want to run on the machine once it got there. You can add D_JOB (and D_MACHINE) to your STARTD_DEBUG to get an idea of what the Requirements looked like when the job was rejected.

Best,


matt


On 01/21/2010 09:32 AM, Francisco Pereira wrote:
> Hi,
> 
> I have encountered a strange problem that I am unable to diagnose,
> even after consulting the manuals, the logs and googling for the
> relevant error messages. The situation is as follows:
> - I have a submit file for a test job that specifies, as the only
> requirements, that a certain slot/machine is used (happens to be a
> slot in the master)
> - For some time after the master is started, it will run the test job fine
> - After some time (and other machines have started and been added to
> the pool), any job submitted to that slot/machine will enter a state
> where its
> status in the queue changes to R for a fraction of a second and then
> goes back to I
> 
> I've looked at all 6 log files for the machine for two minutes after
> submission, and the relevant excerpts of those (as well as condor_q -l
> output)
> follow below. The only things that seem amiss are
> 
> SchedLog:
> 
> 01/21 11:50:38 (pid:28409) Completed REQUEST_CLAIM to startd
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
> fpereira@xxxxxxxxxxxxx
> 01/21 11:50:38 (pid:28409) Starting add_shadow_birthdate(39.0)
> 01/21 11:50:38 (pid:28409) Started shadow for job 39.0 on
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
> fpereira@xxxxxxxxxxxxx, (shadow pid = 17679)
> 01/21 11:50:38 (pid:28409) Shadow pid 17679 for job 39.0 exited with status 108
> 01/21 11:50:38 (pid:28409) Completed RELEASE_CLAIM to startd at
> <128.112.146.182:33269>
> 
> ShadowLog
> 
> 01/21 11:50:38 DaemonCore: Command Socket at <128.112.146.182:49301>
> 01/21 11:50:38 Initializing a VANILLA shadow for job 39.0
> 01/21 11:50:38 (39.0) (17679): Request to run on
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> was REFUSED
> 01/21 11:50:38 (39.0) (17679): Job 39.0 is being evicted from
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:50:38 (39.0) (17679): logEvictEvent with unknown reason (108), aborting
> 01/21 11:50:38 (39.0) (17679): **** condor_shadow (condor_SHADOW) pid
> 17679 EXITING WITH STATUS 108
> 
> I did find a message on the mailing list archive and a reference to
> the error code on the manual
> 
> http://www.cs.wisc.edu/condor/manual/v7.3/11_Appendix_B.html
> "108 	JOB_NOT_STARTED 	can not connect to the condor_startd or request refused"
> 
> but  it *could* connect to the start daemon this time (as it could before)
> 
> StartLog
> 
> 01/21 11:50:38 slot1: match_info called
> 01/21 11:50:38 slot1: Received match <128.112.146.182:33269>#1263850672#6829#...
> 01/21 11:50:38 slot1: State change: match notification protocol successful
> 01/21 11:50:38 slot1: Changing state: Unclaimed -> Matched
> 01/21 11:50:38 slot1: Request accepted.
> 01/21 11:50:38 slot1: Remote owner is fpereira@xxxxxxxxxxxxx
> 01/21 11:50:38 slot1: State change: claiming protocol successful
> 01/21 11:50:38 slot1: Changing state: Matched -> Claimed
> 01/21 11:50:38 slot1: Got activate_claim request from shadow
> (<128.112.146.182:41584>)
> 01/21 11:50:38 slot1: Job Requirements check failed!
> 01/21 11:50:38 slot1: Called deactivate_claim_forcibly()
> 01/21 11:50:38 slot1: State change: received RELEASE_CLAIM command
> 01/21 11:50:38 slot1: Changing state and activity: Claimed/Idle ->
> Preempting/Vacating
> 01/21 11:50:38 slot1: State change: No preempting claim, returning to owner
> 01/21 11:50:38 slot1: Changing state and activity: Preempting/Vacating
> -> Owner/Idle
> 01/21 11:50:38 slot1: State change: IS_OWNER is false
> 01/21 11:50:38 slot1: Changing state: Owner -> Unclaimed
> 
> so I don't know what the issue could be here...
> 
> Thank you for any light you might be able to shed on this, or any
> further diagnostic questions you can suggest.
> 
> Francisco
> 
> 
> 
> condor_q
> 
> -- Submitter: machine.csbmb.princeton.edu : <128.112.146.182:60895> :
> machine.csbmb.princeton.edu
>  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
>   39.0   fpereira        1/21 11:50   0+00:00:01 I  0   0.0  wrapper.pl $$(OpSy
> 
> 1 jobs; 1 idle, 0 running, 0 held
> 
> 
> MasterLog
> 
> 
> MatchLog
> 
> 01/21 11:50:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
> <128.112.146.182:60895> preempting none <128.112.146.182:33269>
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:51:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
> <128.112.146.182:60895> preempting none <128.112.146.182:33269>
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:52:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
> <128.112.146.182:60895> preempting none <128.112.146.182:33269>
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 
> NegotiatorLog
> 
> 01/21 11:50:18 ---------- Started Negotiation Cycle ----------
> 01/21 11:50:18 Phase 1:  Obtaining ads from collector ...
> 01/21 11:50:18   Getting all public ads ...
> 01/21 11:50:18   Sorting 32 ads ...
> 01/21 11:50:18   Getting startd private ads ...
> 01/21 11:50:18 Got ads: 32 public and 15 private
> 01/21 11:50:18 Public ads include 1 submitter, 15 startd
> 01/21 11:50:18 Phase 2:  Performing accounting ...
> 01/21 11:50:18 Phase 3:  Sorting submitter ads by priority ...
> 01/21 11:50:18 Phase 4.1:  Negotiating with schedds ...
> 01/21 11:50:18 ---------- Finished Negotiation Cycle ----------
> 01/21 11:50:38 ---------- Started Negotiation Cycle ----------
> 01/21 11:50:38 Phase 1:  Obtaining ads from collector ...
> 01/21 11:50:38   Getting all public ads ...
> 01/21 11:50:38   Sorting 32 ads ...
> 01/21 11:50:38   Getting startd private ads ...
> 01/21 11:50:38 Got ads: 32 public and 15 private
> 01/21 11:50:38 Public ads include 1 submitter, 15 startd
> 01/21 11:50:38 Phase 2:  Performing accounting ...
> 01/21 11:50:38 Phase 3:  Sorting submitter ads by priority ...
> 01/21 11:50:38 Phase 4.1:  Negotiating with schedds ...
> 01/21 11:50:38   Negotiating with fpereira@xxxxxxxxxxxxx at
> <128.112.146.182:60895>
> 01/21 11:50:38 0 seconds so far
> 01/21 11:50:38     Request 00039.00000:
> 01/21 11:50:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
> <128.112.146.182:60895> preempting none <128.112.146.182:33269>
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:50:38       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:50:38     Got NO_MORE_JOBS;  done negotiating
> 01/21 11:50:38 ---------- Finished Negotiation Cycle ----------
> 01/21 11:51:38 ---------- Started Negotiation Cycle ----------
> 01/21 11:51:38 Phase 1:  Obtaining ads from collector ...
> 01/21 11:51:38   Getting all public ads ...
> 01/21 11:51:38   Sorting 32 ads ...
> 01/21 11:51:38   Getting startd private ads ...
> 01/21 11:51:38 Got ads: 32 public and 15 private
> 01/21 11:51:38 Public ads include 1 submitter, 15 startd
> 01/21 11:51:38 Phase 2:  Performing accounting ...
> 01/21 11:51:38 Phase 3:  Sorting submitter ads by priority ...
> 01/21 11:51:38 Phase 4.1:  Negotiating with schedds ...
> 01/21 11:51:38   Negotiating with fpereira@xxxxxxxxxxxxx at
> <128.112.146.182:60895>
> 01/21 11:51:38 0 seconds so far
> 01/21 11:51:38     Request 00039.00000:
> 01/21 11:51:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
> <128.112.146.182:60895> preempting none <128.112.146.182:33269>
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:51:38       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:51:38     Got NO_MORE_JOBS;  done negotiating
> 01/21 11:51:38 ---------- Finished Negotiation Cycle ----------
> 01/21 11:52:38 ---------- Started Negotiation Cycle ----------
> 01/21 11:52:38 Phase 1:  Obtaining ads from collector ...
> 01/21 11:52:38   Getting all public ads ...
> 01/21 11:52:38   Sorting 32 ads ...
> 01/21 11:52:38   Getting startd private ads ...
> 01/21 11:52:38 Got ads: 32 public and 15 private
> 01/21 11:52:38 Public ads include 1 submitter, 15 startd
> 01/21 11:52:38 Phase 2:  Performing accounting ...
> 01/21 11:52:38 Phase 3:  Sorting submitter ads by priority ...
> 01/21 11:52:38 Phase 4.1:  Negotiating with schedds ...
> 01/21 11:52:38   Negotiating with fpereira@xxxxxxxxxxxxx at
> <128.112.146.182:60895>
> 01/21 11:52:38 0 seconds so far
> 01/21 11:52:38     Request 00039.00000:
> 01/21 11:52:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
> <128.112.146.182:60895> preempting none <128.112.146.182:33269>
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:52:38       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:52:38     Got NO_MORE_JOBS;  done negotiating
> 01/21 11:52:38 ---------- Finished Negotiation Cycle ----------
> 
> SchedLog
> 
> 01/21 11:50:28 (pid:28409) Sent ad to central manager for fpereira@xxxxxxxxxxxxx
> 01/21 11:50:28 (pid:28409) Sent ad to 1 collectors for fpereira@xxxxxxxxxxxxx
> 01/21 11:50:28 (pid:28409) Collector machine.csbmb.princeton.edu
> <128.112.146.182:9618> is still being avoided if an alternative
> succeeds.
> 01/21 11:50:38 (pid:28409) Activity on stashed negotiator socket
> 01/21 11:50:38 (pid:28409) Negotiating for owner: fpereira@xxxxxxxxxxxxx
> 01/21 11:50:38 (pid:28409) Checking consistency running and runnable jobs
> 01/21 11:50:38 (pid:28409) Tables are consistent
> 01/21 11:50:38 (pid:28409) Rebuilt prioritized runnable job list in 0.000s.
> 01/21 11:50:38 (pid:28409) Out of jobs - 1 jobs matched, 0 jobs idle,
> flock level = 0
> 01/21 11:50:38 (pid:28409) Sent ad to central manager for fpereira@xxxxxxxxxxxxx
> 01/21 11:50:38 (pid:28409) Sent ad to 1 collectors for fpereira@xxxxxxxxxxxxx
> 01/21 11:50:38 (pid:28409) Completed REQUEST_CLAIM to startd
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
> fpereira@xxxxxxxxxxxxx
> 01/21 11:50:38 (pid:28409) Starting add_shadow_birthdate(39.0)
> 01/21 11:50:38 (pid:28409) Started shadow for job 39.0 on
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
> fpereira@xxxxxxxxxxxxx, (shadow pid = 17679)
> 01/21 11:50:38 (pid:28409) Shadow pid 17679 for job 39.0 exited with status 108
> 01/21 11:50:38 (pid:28409) Completed RELEASE_CLAIM to startd at
> <128.112.146.182:33269>
> 01/21 11:50:38 (pid:28409) Match record
> (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
> fpereira@xxxxxxxxxxxxx, 39.0) deleted
> 01/21 11:51:38 (pid:28409) Activity on stashed negotiator socket
> 01/21 11:51:38 (pid:28409) Negotiating for owner: fpereira@xxxxxxxxxxxxx
> 01/21 11:51:38 (pid:28409) Checking consistency running and runnable jobs
> 01/21 11:51:38 (pid:28409) Tables are consistent
> 01/21 11:51:38 (pid:28409) Rebuilt prioritized runnable job list in 0.000s.
> 01/21 11:51:38 (pid:28409) Out of jobs - 1 jobs matched, 0 jobs idle,
> flock level = 0
> 01/21 11:51:38 (pid:28409) Sent ad to central manager for fpereira@xxxxxxxxxxxxx
> 01/21 11:51:38 (pid:28409) Sent ad to 1 collectors for fpereira@xxxxxxxxxxxxx
> 01/21 11:51:38 (pid:28409) Completed REQUEST_CLAIM to startd
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
> fpereira@xxxxxxxxxxxxx
> 01/21 11:51:38 (pid:28409) Starting add_shadow_birthdate(39.0)
> 01/21 11:51:38 (pid:28409) Started shadow for job 39.0 on
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
> fpereira@xxxxxxxxxxxxx, (shadow pid = 17682)
> 01/21 11:51:38 (pid:28409) Shadow pid 17682 for job 39.0 exited with status 108
> 01/21 11:51:38 (pid:28409) Completed RELEASE_CLAIM to startd at
> <128.112.146.182:33269>
> 01/21 11:51:38 (pid:28409) Match record
> (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
> fpereira@xxxxxxxxxxxxx, 39.0) deleted
> 01/21 11:52:38 (pid:28409) Activity on stashed negotiator socket
> 01/21 11:52:38 (pid:28409) Negotiating for owner: fpereira@xxxxxxxxxxxxx
> 01/21 11:52:38 (pid:28409) Checking consistency running and runnable jobs
> 01/21 11:52:38 (pid:28409) Tables are consistent
> 01/21 11:52:38 (pid:28409) Rebuilt prioritized runnable job list in 0.000s.
> 01/21 11:52:38 (pid:28409) Out of jobs - 1 jobs matched, 0 jobs idle,
> flock level = 0
> 01/21 11:52:38 (pid:28409) Sent ad to central manager for fpereira@xxxxxxxxxxxxx
> 01/21 11:52:38 (pid:28409) Sent ad to 1 collectors for fpereira@xxxxxxxxxxxxx
> 01/21 11:52:38 (pid:28409) Completed REQUEST_CLAIM to startd
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
> fpereira@xxxxxxxxxxxxx
> 01/21 11:52:38 (pid:28409) Starting add_shadow_birthdate(39.0)
> 01/21 11:52:38 (pid:28409) Started shadow for job 39.0 on
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
> fpereira@xxxxxxxxxxxxx, (shadow pid = 17685)
> 01/21 11:52:39 (pid:28409) Shadow pid 17685 for job 39.0 exited with status 108
> 01/21 11:52:39 (pid:28409) Completed RELEASE_CLAIM to startd at
> <128.112.146.182:33269>
> 01/21 11:52:39 (pid:28409) Match record
> (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
> fpereira@xxxxxxxxxxxxx, 39.0) deleted
> 
> 
> ShadowLog
> 
> 01/21 11:50:38 ******************************************************
> 01/21 11:50:38 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> 01/21 11:50:38 ** /Volumes/Work/CondorLINUX/sbin/condor_shadow
> 01/21 11:50:38 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
> 01/21 11:50:38 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
> 01/21 11:50:38 ** $CondorVersion: 7.4.0 Nov  1 2009 BuildID: 193173 $
> 01/21 11:50:38 ** $CondorPlatform: I386-LINUX_RHEL5 $
> 01/21 11:50:38 ** PID = 17679
> 01/21 11:50:38 ** Log last touched 1/21 11:40:18
> 01/21 11:50:38 ******************************************************
> 01/21 11:50:38 Using config source: /home/condor/condor_config
> 01/21 11:50:38 Using local config sources:
> 01/21 11:50:38    /home/condor/hosts/machine/etc/config.local
> 01/21 11:50:38 DaemonCore: Command Socket at <128.112.146.182:49301>
> 01/21 11:50:38 Initializing a VANILLA shadow for job 39.0
> 01/21 11:50:38 (39.0) (17679): Request to run on
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> was REFUSED
> 01/21 11:50:38 (39.0) (17679): Job 39.0 is being evicted from
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:50:38 (39.0) (17679): logEvictEvent with unknown reason (108), aborting
> 01/21 11:50:38 (39.0) (17679): **** condor_shadow (condor_SHADOW) pid
> 17679 EXITING WITH STATUS 108
> 01/21 11:51:38 ******************************************************
> 01/21 11:51:38 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> 01/21 11:51:38 ** /Volumes/Work/CondorLINUX/sbin/condor_shadow
> 01/21 11:51:38 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
> 01/21 11:51:38 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
> 01/21 11:51:38 ** $CondorVersion: 7.4.0 Nov  1 2009 BuildID: 193173 $
> 01/21 11:51:38 ** $CondorPlatform: I386-LINUX_RHEL5 $
> 01/21 11:51:38 ** PID = 17682
> 01/21 11:51:38 ** Log last touched 1/21 11:50:38
> 01/21 11:51:38 ******************************************************
> 01/21 11:51:38 Using config source: /home/condor/condor_config
> 01/21 11:51:38 Using local config sources:
> 01/21 11:51:38    /home/condor/hosts/machine/etc/config.local
> 01/21 11:51:38 DaemonCore: Command Socket at <128.112.146.182:60874>
> 01/21 11:51:38 Initializing a VANILLA shadow for job 39.0
> 01/21 11:51:38 (39.0) (17682): Request to run on
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> was REFUSED
> 01/21 11:51:38 (39.0) (17682): Job 39.0 is being evicted from
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:51:38 (39.0) (17682): logEvictEvent with unknown reason (108), aborting
> 01/21 11:51:38 (39.0) (17682): **** condor_shadow (condor_SHADOW) pid
> 17682 EXITING WITH STATUS 108
> 01/21 11:52:38 ******************************************************
> 01/21 11:52:38 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> 01/21 11:52:38 ** /Volumes/Work/CondorLINUX/sbin/condor_shadow
> 01/21 11:52:38 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
> 01/21 11:52:38 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
> 01/21 11:52:38 ** $CondorVersion: 7.4.0 Nov  1 2009 BuildID: 193173 $
> 01/21 11:52:38 ** $CondorPlatform: I386-LINUX_RHEL5 $
> 01/21 11:52:38 ** PID = 17685
> 01/21 11:52:38 ** Log last touched 1/21 11:51:38
> 01/21 11:52:38 ******************************************************
> 01/21 11:52:38 Using config source: /home/condor/condor_config
> 01/21 11:52:38 Using local config sources:
> 01/21 11:52:38    /home/condor/hosts/machine/etc/config.local
> 01/21 11:52:38 DaemonCore: Command Socket at <128.112.146.182:60357>
> 01/21 11:52:38 Initializing a VANILLA shadow for job 39.0
> 01/21 11:52:39 (39.0) (17685): Request to run on
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> was REFUSED
> 01/21 11:52:39 (39.0) (17685): Job 39.0 is being evicted from
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 01/21 11:52:39 (39.0) (17685): logEvictEvent with unknown reason (108), aborting
> 01/21 11:52:39 (39.0) (17685): **** condor_shadow (condor_SHADOW) pid
> 17685 EXITING WITH STATUS 108
> 
> StartLog
> 
> 01/21 11:50:38 slot1: match_info called
> 01/21 11:50:38 slot1: Received match <128.112.146.182:33269>#1263850672#6829#...
> 01/21 11:50:38 slot1: State change: match notification protocol successful
> 01/21 11:50:38 slot1: Changing state: Unclaimed -> Matched
> 01/21 11:50:38 slot1: Request accepted.
> 01/21 11:50:38 slot1: Remote owner is fpereira@xxxxxxxxxxxxx
> 01/21 11:50:38 slot1: State change: claiming protocol successful
> 01/21 11:50:38 slot1: Changing state: Matched -> Claimed
> 01/21 11:50:38 slot1: Got activate_claim request from shadow
> (<128.112.146.182:41584>)
> 01/21 11:50:38 slot1: Job Requirements check failed!
> 01/21 11:50:38 slot1: Called deactivate_claim_forcibly()
> 01/21 11:50:38 slot1: State change: received RELEASE_CLAIM command
> 01/21 11:50:38 slot1: Changing state and activity: Claimed/Idle ->
> Preempting/Vacating
> 01/21 11:50:38 slot1: State change: No preempting claim, returning to owner
> 01/21 11:50:38 slot1: Changing state and activity: Preempting/Vacating
> -> Owner/Idle
> 01/21 11:50:38 slot1: State change: IS_OWNER is false
> 01/21 11:50:38 slot1: Changing state: Owner -> Unclaimed
> 01/21 11:51:38 slot1: match_info called
> 01/21 11:51:38 slot1: Received match <128.112.146.182:33269>#1263850672#6833#...
> 01/21 11:51:38 slot1: State change: match notification protocol successful
> 01/21 11:51:38 slot1: Changing state: Unclaimed -> Matched
> 01/21 11:51:38 slot1: Request accepted.
> 01/21 11:51:38 slot1: Remote owner is fpereira@xxxxxxxxxxxxx
> 01/21 11:51:38 slot1: State change: claiming protocol successful
> 01/21 11:51:38 slot1: Changing state: Matched -> Claimed
> 01/21 11:51:38 slot1: Got activate_claim request from shadow
> (<128.112.146.182:39537>)
> 01/21 11:51:38 slot1: Job Requirements check failed!
> 01/21 11:51:38 slot1: Called deactivate_claim_forcibly()
> 01/21 11:51:38 slot1: State change: received RELEASE_CLAIM command
> 01/21 11:51:38 slot1: Changing state and activity: Claimed/Idle ->
> Preempting/Vacating
> 01/21 11:51:38 slot1: State change: No preempting claim, returning to owner
> 01/21 11:51:38 slot1: Changing state and activity: Preempting/Vacating
> -> Owner/Idle
> 01/21 11:51:38 slot1: State change: IS_OWNER is false
> 01/21 11:51:38 slot1: Changing state: Owner -> Unclaimed
> 01/21 11:52:38 slot1: match_info called
> 01/21 11:52:38 slot1: Received match <128.112.146.182:33269>#1263850672#6835#...
> 01/21 11:52:38 slot1: State change: match notification protocol successful
> 01/21 11:52:38 slot1: Changing state: Unclaimed -> Matched
> 01/21 11:52:38 slot1: Request accepted.
> 01/21 11:52:38 slot1: Remote owner is fpereira@xxxxxxxxxxxxx
> 01/21 11:52:38 slot1: State change: claiming protocol successful
> 01/21 11:52:38 slot1: Changing state: Matched -> Claimed
> 01/21 11:52:39 slot1: Got activate_claim request from shadow
> (<128.112.146.182:33337>)
> 01/21 11:52:39 slot1: Job Requirements check failed!
> 01/21 11:52:39 slot1: Called deactivate_claim_forcibly()
> 01/21 11:52:39 slot1: State change: received RELEASE_CLAIM command
> 01/21 11:52:39 slot1: Changing state and activity: Claimed/Idle ->
> Preempting/Vacating
> 01/21 11:52:39 slot1: State change: No preempting claim, returning to owner
> 01/21 11:52:39 slot1: Changing state and activity: Preempting/Vacating
> -> Owner/Idle
> 01/21 11:52:39 slot1: State change: IS_OWNER is false
> 01/21 11:52:39 slot1: Changing state: Owner -> Unclaimed
> 
> condor_q -l
> 
> ClusterId = 39
> QDate = 1264092628
> CompletionDate = 0
> Owner = "fpereira"
> LocalUserCpu = 0.000000
> LocalSysCpu = 0.000000
> RemoteUserCpu = 0.000000
> RemoteSysCpu = 0.000000
> ExitStatus = 0
> NumCkpts_RAW = 0
> NumCkpts = 0
> NumJobStarts = 0
> NumRestarts = 0
> NumSystemHolds = 0
> CommittedTime = 0
> TotalSuspensions = 0
> CumulativeSuspensionTime = 0
> ExitBySignal = FALSE
> CondorVersion = "$CondorVersion: 7.4.0 Nov  1 2009 BuildID: 193173 $"
> CondorPlatform = "$CondorPlatform: I386-LINUX_RHEL5 $"
> RootDir = "/"
> Iwd = "/Volumes/Work/CondorTests/MatlabTestDir"
> JobUniverse = 5
> Cmd = "/Volumes/Work/CondorTests/wrapper.pl"
> MinHosts = 1
> WantRemoteSyscalls = FALSE
> WantCheckpoint = FALSE
> RequestCpus = 1
> JobPrio = 0
> User = "fpereira@xxxxxxxxxxxxx"
> NiceUser = FALSE
> Environment = " _=/usr/local/bin/condor_submit
> QTINC=/usr/lib/qt-3.3/include CVS_RSH=ssh QTLIB=/usr/lib/qt-3.3/lib
> PWD=/Volumes/Work/CondorTests SHLVL=1 PS1=\u@\h' '\w' '$' '
> LANG=en_US.UTF-8 TERM=xterm-color MAIL=/var/spool/mail/fpereira
> LESSOPEN=|/usr/bin/lesspipe.sh' '%s OLDPWD=/Volumes/Work
> SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
> G_BROKEN_FILENAMES=1 QTDIR=/usr/lib/qt-3.3 SHELL=/bin/bash
> USER=fpereira PATH=/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/usr/lib/ccache:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/fpereira/bin:/usr/local/bin:/sw/bin:/opt/local/bin:/usr/local/mysql/bin:/Volumes/Work/CondorOSX/bin:/Applications/MATLAB_R2008a/bin:/Applications/AFNI
> HISTSIZE=1000 LOGNAME=fpereira HOSTNAME=machine.csbmb.princeton.edu
> HOME=/home/fpereira"
> JobNotification = 0
> WantRemoteIO = TRUE
> UserLog = "/Volumes/Work/CondorTests/MatlabTestDir/matlab.$$(Name).log.txt"
> CoreSize = 0
> KillSig = "SIGTERM"
> Rank = 0.000000
> In = "/dev/null"
> TransferIn = FALSE
> Out = "matlab.$$(Name).out.txt"
> StreamOut = FALSE
> Err = "matlab.$$(Name).err.txt"
> StreamErr = FALSE
> BufferSize = 524288
> BufferBlockSize = 32768
> ShouldTransferFiles = "NO"
> TransferFiles = "NEVER"
> ImageSize_RAW = 2
> ImageSize = 2
> ExecutableSize_RAW = 2
> ExecutableSize = 2
> DiskUsage_RAW = 2
> DiskUsage = 2
> RequestMemory = ceiling(ifThenElse(JobVMMemory =!= UNDEFINED,
> JobVMMemory, ImageSize / 1024.000000))
> RequestDisk = DiskUsage
> Requirements = (((Name == "slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx") &&
> ((OpSys == "OSX") || (OpSys == "LINUX")))) && (Arch == "INTEL") &&
> (Disk >= DiskUsage) && ((Memory * 1024) >= ImageSize) &&
> (TARGET.FileSystemDomain == MY.FileSystemDomain)
> FileSystemDomain = "princeton.edu"
> JobLeaseDuration = 1200
> PeriodicHold = FALSE
> PeriodicRelease = FALSE
> PeriodicRemove = FALSE
> OnExitHold = FALSE
> OnExitRemove = TRUE
> LeaveJobInQueue = FALSE
> Args = "$$(OpSys) run.m"
> GlobalJobId = "machine.csbmb.princeton.edu#39.0#1264092628"
> ProcId = 0
> AutoClusterId = 0
> AutoClusterAttrs =
> "JobUniverse,LastCheckpointPlatform,NumCkpts,DiskUsage,ImageSize,FileSystemDomain,Requirements,NiceUser,ConcurrencyLimits"
> JobStartDate = 1264092638
> WantMatchDiagnostics = TRUE
> LastMatchTime = 1264094139
> NumJobMatches = 26
> OrigMaxHosts = 1
> LastJobLeaseRenewal = 1264094139
> StartdPrincipal = "128.112.146.182"
> JobLastStartDate = 1264094079
> JobCurrentStartDate = 1264094139
> NumShadowStarts = 26
> JobRunCount = 26
> MATCH_EXP_UserLog =
> "/Volumes/Work/CondorTests/MatlabTestDir/matlab.slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
> MATCH_EXP_Out = "matlab.slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
> MATCH_Name = "slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx"
> MATCH_EXP_Err = "matlab.slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
> MATCH_OpSys = "LINUX"
> MATCH_EXP_Args = "LINUX run.m"
> LastVacateTime = 1264094139
> BytesSent = 0.000000
> BytesRecvd = 0.000000
> RemoteWallClockTime = 1.000000
> LastRemoteHost = "slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx"
> LastPublicClaimId = "<128.112.146.182:33269>#1263850672#6881#..."
> LastPublicClaimIds = ""
> CurrentHosts = 0
> LastJobStatus = 2
> JobStatus = 1
> EnteredCurrentStatus = 1264094139
> LastSuspensionTime = 0
> MaxHosts = 1
> ServerTime = 1264094154
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/