[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] machine stops executing jobs



Hi,

I have encountered a strange problem that I am unable to diagnose,
even after consulting the manuals, the logs and googling for the
relevant error messages. The situation is as follows:
- I have a submit file for a test job that specifies, as the only
requirements, that a certain slot/machine is used (happens to be a
slot in the master)
- For some time after the master is started, it will run the test job fine
- After some time (and other machines have started and been added to
the pool), any job submitted to that slot/machine will enter a state
where its
status in the queue changes to R for a fraction of a second and then
goes back to I

I've looked at all 6 log files for the machine for two minutes after
submission, and the relevant excerpts of those (as well as condor_q -l
output)
follow below. The only things that seem amiss are

SchedLog:

01/21 11:50:38 (pid:28409) Completed REQUEST_CLAIM to startd
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
fpereira@xxxxxxxxxxxxx
01/21 11:50:38 (pid:28409) Starting add_shadow_birthdate(39.0)
01/21 11:50:38 (pid:28409) Started shadow for job 39.0 on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
fpereira@xxxxxxxxxxxxx, (shadow pid = 17679)
01/21 11:50:38 (pid:28409) Shadow pid 17679 for job 39.0 exited with status 108
01/21 11:50:38 (pid:28409) Completed RELEASE_CLAIM to startd at
<128.112.146.182:33269>

ShadowLog

01/21 11:50:38 DaemonCore: Command Socket at <128.112.146.182:49301>
01/21 11:50:38 Initializing a VANILLA shadow for job 39.0
01/21 11:50:38 (39.0) (17679): Request to run on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> was REFUSED
01/21 11:50:38 (39.0) (17679): Job 39.0 is being evicted from
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:50:38 (39.0) (17679): logEvictEvent with unknown reason (108), aborting
01/21 11:50:38 (39.0) (17679): **** condor_shadow (condor_SHADOW) pid
17679 EXITING WITH STATUS 108

I did find a message on the mailing list archive and a reference to
the error code on the manual

http://www.cs.wisc.edu/condor/manual/v7.3/11_Appendix_B.html
"108 	JOB_NOT_STARTED 	can not connect to the condor_startd or request refused"

but  it *could* connect to the start daemon this time (as it could before)

StartLog

01/21 11:50:38 slot1: match_info called
01/21 11:50:38 slot1: Received match <128.112.146.182:33269>#1263850672#6829#...
01/21 11:50:38 slot1: State change: match notification protocol successful
01/21 11:50:38 slot1: Changing state: Unclaimed -> Matched
01/21 11:50:38 slot1: Request accepted.
01/21 11:50:38 slot1: Remote owner is fpereira@xxxxxxxxxxxxx
01/21 11:50:38 slot1: State change: claiming protocol successful
01/21 11:50:38 slot1: Changing state: Matched -> Claimed
01/21 11:50:38 slot1: Got activate_claim request from shadow
(<128.112.146.182:41584>)
01/21 11:50:38 slot1: Job Requirements check failed!
01/21 11:50:38 slot1: Called deactivate_claim_forcibly()
01/21 11:50:38 slot1: State change: received RELEASE_CLAIM command
01/21 11:50:38 slot1: Changing state and activity: Claimed/Idle ->
Preempting/Vacating
01/21 11:50:38 slot1: State change: No preempting claim, returning to owner
01/21 11:50:38 slot1: Changing state and activity: Preempting/Vacating
-> Owner/Idle
01/21 11:50:38 slot1: State change: IS_OWNER is false
01/21 11:50:38 slot1: Changing state: Owner -> Unclaimed

so I don't know what the issue could be here...

Thank you for any light you might be able to shed on this, or any
further diagnostic questions you can suggest.

Francisco



condor_q

-- Submitter: machine.csbmb.princeton.edu : <128.112.146.182:60895> :
machine.csbmb.princeton.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  39.0   fpereira        1/21 11:50   0+00:00:01 I  0   0.0  wrapper.pl $$(OpSy

1 jobs; 1 idle, 0 running, 0 held


MasterLog


MatchLog

01/21 11:50:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
<128.112.146.182:60895> preempting none <128.112.146.182:33269>
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:51:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
<128.112.146.182:60895> preempting none <128.112.146.182:33269>
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:52:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
<128.112.146.182:60895> preempting none <128.112.146.182:33269>
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx

NegotiatorLog

01/21 11:50:18 ---------- Started Negotiation Cycle ----------
01/21 11:50:18 Phase 1:  Obtaining ads from collector ...
01/21 11:50:18   Getting all public ads ...
01/21 11:50:18   Sorting 32 ads ...
01/21 11:50:18   Getting startd private ads ...
01/21 11:50:18 Got ads: 32 public and 15 private
01/21 11:50:18 Public ads include 1 submitter, 15 startd
01/21 11:50:18 Phase 2:  Performing accounting ...
01/21 11:50:18 Phase 3:  Sorting submitter ads by priority ...
01/21 11:50:18 Phase 4.1:  Negotiating with schedds ...
01/21 11:50:18 ---------- Finished Negotiation Cycle ----------
01/21 11:50:38 ---------- Started Negotiation Cycle ----------
01/21 11:50:38 Phase 1:  Obtaining ads from collector ...
01/21 11:50:38   Getting all public ads ...
01/21 11:50:38   Sorting 32 ads ...
01/21 11:50:38   Getting startd private ads ...
01/21 11:50:38 Got ads: 32 public and 15 private
01/21 11:50:38 Public ads include 1 submitter, 15 startd
01/21 11:50:38 Phase 2:  Performing accounting ...
01/21 11:50:38 Phase 3:  Sorting submitter ads by priority ...
01/21 11:50:38 Phase 4.1:  Negotiating with schedds ...
01/21 11:50:38   Negotiating with fpereira@xxxxxxxxxxxxx at
<128.112.146.182:60895>
01/21 11:50:38 0 seconds so far
01/21 11:50:38     Request 00039.00000:
01/21 11:50:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
<128.112.146.182:60895> preempting none <128.112.146.182:33269>
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:50:38       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:50:38     Got NO_MORE_JOBS;  done negotiating
01/21 11:50:38 ---------- Finished Negotiation Cycle ----------
01/21 11:51:38 ---------- Started Negotiation Cycle ----------
01/21 11:51:38 Phase 1:  Obtaining ads from collector ...
01/21 11:51:38   Getting all public ads ...
01/21 11:51:38   Sorting 32 ads ...
01/21 11:51:38   Getting startd private ads ...
01/21 11:51:38 Got ads: 32 public and 15 private
01/21 11:51:38 Public ads include 1 submitter, 15 startd
01/21 11:51:38 Phase 2:  Performing accounting ...
01/21 11:51:38 Phase 3:  Sorting submitter ads by priority ...
01/21 11:51:38 Phase 4.1:  Negotiating with schedds ...
01/21 11:51:38   Negotiating with fpereira@xxxxxxxxxxxxx at
<128.112.146.182:60895>
01/21 11:51:38 0 seconds so far
01/21 11:51:38     Request 00039.00000:
01/21 11:51:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
<128.112.146.182:60895> preempting none <128.112.146.182:33269>
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:51:38       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:51:38     Got NO_MORE_JOBS;  done negotiating
01/21 11:51:38 ---------- Finished Negotiation Cycle ----------
01/21 11:52:38 ---------- Started Negotiation Cycle ----------
01/21 11:52:38 Phase 1:  Obtaining ads from collector ...
01/21 11:52:38   Getting all public ads ...
01/21 11:52:38   Sorting 32 ads ...
01/21 11:52:38   Getting startd private ads ...
01/21 11:52:38 Got ads: 32 public and 15 private
01/21 11:52:38 Public ads include 1 submitter, 15 startd
01/21 11:52:38 Phase 2:  Performing accounting ...
01/21 11:52:38 Phase 3:  Sorting submitter ads by priority ...
01/21 11:52:38 Phase 4.1:  Negotiating with schedds ...
01/21 11:52:38   Negotiating with fpereira@xxxxxxxxxxxxx at
<128.112.146.182:60895>
01/21 11:52:38 0 seconds so far
01/21 11:52:38     Request 00039.00000:
01/21 11:52:38       Matched 39.0 fpereira@xxxxxxxxxxxxx
<128.112.146.182:60895> preempting none <128.112.146.182:33269>
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:52:38       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:52:38     Got NO_MORE_JOBS;  done negotiating
01/21 11:52:38 ---------- Finished Negotiation Cycle ----------

SchedLog

01/21 11:50:28 (pid:28409) Sent ad to central manager for fpereira@xxxxxxxxxxxxx
01/21 11:50:28 (pid:28409) Sent ad to 1 collectors for fpereira@xxxxxxxxxxxxx
01/21 11:50:28 (pid:28409) Collector machine.csbmb.princeton.edu
<128.112.146.182:9618> is still being avoided if an alternative
succeeds.
01/21 11:50:38 (pid:28409) Activity on stashed negotiator socket
01/21 11:50:38 (pid:28409) Negotiating for owner: fpereira@xxxxxxxxxxxxx
01/21 11:50:38 (pid:28409) Checking consistency running and runnable jobs
01/21 11:50:38 (pid:28409) Tables are consistent
01/21 11:50:38 (pid:28409) Rebuilt prioritized runnable job list in 0.000s.
01/21 11:50:38 (pid:28409) Out of jobs - 1 jobs matched, 0 jobs idle,
flock level = 0
01/21 11:50:38 (pid:28409) Sent ad to central manager for fpereira@xxxxxxxxxxxxx
01/21 11:50:38 (pid:28409) Sent ad to 1 collectors for fpereira@xxxxxxxxxxxxx
01/21 11:50:38 (pid:28409) Completed REQUEST_CLAIM to startd
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
fpereira@xxxxxxxxxxxxx
01/21 11:50:38 (pid:28409) Starting add_shadow_birthdate(39.0)
01/21 11:50:38 (pid:28409) Started shadow for job 39.0 on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
fpereira@xxxxxxxxxxxxx, (shadow pid = 17679)
01/21 11:50:38 (pid:28409) Shadow pid 17679 for job 39.0 exited with status 108
01/21 11:50:38 (pid:28409) Completed RELEASE_CLAIM to startd at
<128.112.146.182:33269>
01/21 11:50:38 (pid:28409) Match record
(slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
fpereira@xxxxxxxxxxxxx, 39.0) deleted
01/21 11:51:38 (pid:28409) Activity on stashed negotiator socket
01/21 11:51:38 (pid:28409) Negotiating for owner: fpereira@xxxxxxxxxxxxx
01/21 11:51:38 (pid:28409) Checking consistency running and runnable jobs
01/21 11:51:38 (pid:28409) Tables are consistent
01/21 11:51:38 (pid:28409) Rebuilt prioritized runnable job list in 0.000s.
01/21 11:51:38 (pid:28409) Out of jobs - 1 jobs matched, 0 jobs idle,
flock level = 0
01/21 11:51:38 (pid:28409) Sent ad to central manager for fpereira@xxxxxxxxxxxxx
01/21 11:51:38 (pid:28409) Sent ad to 1 collectors for fpereira@xxxxxxxxxxxxx
01/21 11:51:38 (pid:28409) Completed REQUEST_CLAIM to startd
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
fpereira@xxxxxxxxxxxxx
01/21 11:51:38 (pid:28409) Starting add_shadow_birthdate(39.0)
01/21 11:51:38 (pid:28409) Started shadow for job 39.0 on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
fpereira@xxxxxxxxxxxxx, (shadow pid = 17682)
01/21 11:51:38 (pid:28409) Shadow pid 17682 for job 39.0 exited with status 108
01/21 11:51:38 (pid:28409) Completed RELEASE_CLAIM to startd at
<128.112.146.182:33269>
01/21 11:51:38 (pid:28409) Match record
(slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
fpereira@xxxxxxxxxxxxx, 39.0) deleted
01/21 11:52:38 (pid:28409) Activity on stashed negotiator socket
01/21 11:52:38 (pid:28409) Negotiating for owner: fpereira@xxxxxxxxxxxxx
01/21 11:52:38 (pid:28409) Checking consistency running and runnable jobs
01/21 11:52:38 (pid:28409) Tables are consistent
01/21 11:52:38 (pid:28409) Rebuilt prioritized runnable job list in 0.000s.
01/21 11:52:38 (pid:28409) Out of jobs - 1 jobs matched, 0 jobs idle,
flock level = 0
01/21 11:52:38 (pid:28409) Sent ad to central manager for fpereira@xxxxxxxxxxxxx
01/21 11:52:38 (pid:28409) Sent ad to 1 collectors for fpereira@xxxxxxxxxxxxx
01/21 11:52:38 (pid:28409) Completed REQUEST_CLAIM to startd
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
fpereira@xxxxxxxxxxxxx
01/21 11:52:38 (pid:28409) Starting add_shadow_birthdate(39.0)
01/21 11:52:38 (pid:28409) Started shadow for job 39.0 on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
fpereira@xxxxxxxxxxxxx, (shadow pid = 17685)
01/21 11:52:39 (pid:28409) Shadow pid 17685 for job 39.0 exited with status 108
01/21 11:52:39 (pid:28409) Completed RELEASE_CLAIM to startd at
<128.112.146.182:33269>
01/21 11:52:39 (pid:28409) Match record
(slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> for
fpereira@xxxxxxxxxxxxx, 39.0) deleted


ShadowLog

01/21 11:50:38 ******************************************************
01/21 11:50:38 ** condor_shadow (CONDOR_SHADOW) STARTING UP
01/21 11:50:38 ** /Volumes/Work/CondorLINUX/sbin/condor_shadow
01/21 11:50:38 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
01/21 11:50:38 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
01/21 11:50:38 ** $CondorVersion: 7.4.0 Nov  1 2009 BuildID: 193173 $
01/21 11:50:38 ** $CondorPlatform: I386-LINUX_RHEL5 $
01/21 11:50:38 ** PID = 17679
01/21 11:50:38 ** Log last touched 1/21 11:40:18
01/21 11:50:38 ******************************************************
01/21 11:50:38 Using config source: /home/condor/condor_config
01/21 11:50:38 Using local config sources:
01/21 11:50:38    /home/condor/hosts/machine/etc/config.local
01/21 11:50:38 DaemonCore: Command Socket at <128.112.146.182:49301>
01/21 11:50:38 Initializing a VANILLA shadow for job 39.0
01/21 11:50:38 (39.0) (17679): Request to run on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> was REFUSED
01/21 11:50:38 (39.0) (17679): Job 39.0 is being evicted from
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:50:38 (39.0) (17679): logEvictEvent with unknown reason (108), aborting
01/21 11:50:38 (39.0) (17679): **** condor_shadow (condor_SHADOW) pid
17679 EXITING WITH STATUS 108
01/21 11:51:38 ******************************************************
01/21 11:51:38 ** condor_shadow (CONDOR_SHADOW) STARTING UP
01/21 11:51:38 ** /Volumes/Work/CondorLINUX/sbin/condor_shadow
01/21 11:51:38 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
01/21 11:51:38 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
01/21 11:51:38 ** $CondorVersion: 7.4.0 Nov  1 2009 BuildID: 193173 $
01/21 11:51:38 ** $CondorPlatform: I386-LINUX_RHEL5 $
01/21 11:51:38 ** PID = 17682
01/21 11:51:38 ** Log last touched 1/21 11:50:38
01/21 11:51:38 ******************************************************
01/21 11:51:38 Using config source: /home/condor/condor_config
01/21 11:51:38 Using local config sources:
01/21 11:51:38    /home/condor/hosts/machine/etc/config.local
01/21 11:51:38 DaemonCore: Command Socket at <128.112.146.182:60874>
01/21 11:51:38 Initializing a VANILLA shadow for job 39.0
01/21 11:51:38 (39.0) (17682): Request to run on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> was REFUSED
01/21 11:51:38 (39.0) (17682): Job 39.0 is being evicted from
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:51:38 (39.0) (17682): logEvictEvent with unknown reason (108), aborting
01/21 11:51:38 (39.0) (17682): **** condor_shadow (condor_SHADOW) pid
17682 EXITING WITH STATUS 108
01/21 11:52:38 ******************************************************
01/21 11:52:38 ** condor_shadow (CONDOR_SHADOW) STARTING UP
01/21 11:52:38 ** /Volumes/Work/CondorLINUX/sbin/condor_shadow
01/21 11:52:38 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
01/21 11:52:38 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
01/21 11:52:38 ** $CondorVersion: 7.4.0 Nov  1 2009 BuildID: 193173 $
01/21 11:52:38 ** $CondorPlatform: I386-LINUX_RHEL5 $
01/21 11:52:38 ** PID = 17685
01/21 11:52:38 ** Log last touched 1/21 11:51:38
01/21 11:52:38 ******************************************************
01/21 11:52:38 Using config source: /home/condor/condor_config
01/21 11:52:38 Using local config sources:
01/21 11:52:38    /home/condor/hosts/machine/etc/config.local
01/21 11:52:38 DaemonCore: Command Socket at <128.112.146.182:60357>
01/21 11:52:38 Initializing a VANILLA shadow for job 39.0
01/21 11:52:39 (39.0) (17685): Request to run on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx <128.112.146.182:33269> was REFUSED
01/21 11:52:39 (39.0) (17685): Job 39.0 is being evicted from
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
01/21 11:52:39 (39.0) (17685): logEvictEvent with unknown reason (108), aborting
01/21 11:52:39 (39.0) (17685): **** condor_shadow (condor_SHADOW) pid
17685 EXITING WITH STATUS 108

StartLog

01/21 11:50:38 slot1: match_info called
01/21 11:50:38 slot1: Received match <128.112.146.182:33269>#1263850672#6829#...
01/21 11:50:38 slot1: State change: match notification protocol successful
01/21 11:50:38 slot1: Changing state: Unclaimed -> Matched
01/21 11:50:38 slot1: Request accepted.
01/21 11:50:38 slot1: Remote owner is fpereira@xxxxxxxxxxxxx
01/21 11:50:38 slot1: State change: claiming protocol successful
01/21 11:50:38 slot1: Changing state: Matched -> Claimed
01/21 11:50:38 slot1: Got activate_claim request from shadow
(<128.112.146.182:41584>)
01/21 11:50:38 slot1: Job Requirements check failed!
01/21 11:50:38 slot1: Called deactivate_claim_forcibly()
01/21 11:50:38 slot1: State change: received RELEASE_CLAIM command
01/21 11:50:38 slot1: Changing state and activity: Claimed/Idle ->
Preempting/Vacating
01/21 11:50:38 slot1: State change: No preempting claim, returning to owner
01/21 11:50:38 slot1: Changing state and activity: Preempting/Vacating
-> Owner/Idle
01/21 11:50:38 slot1: State change: IS_OWNER is false
01/21 11:50:38 slot1: Changing state: Owner -> Unclaimed
01/21 11:51:38 slot1: match_info called
01/21 11:51:38 slot1: Received match <128.112.146.182:33269>#1263850672#6833#...
01/21 11:51:38 slot1: State change: match notification protocol successful
01/21 11:51:38 slot1: Changing state: Unclaimed -> Matched
01/21 11:51:38 slot1: Request accepted.
01/21 11:51:38 slot1: Remote owner is fpereira@xxxxxxxxxxxxx
01/21 11:51:38 slot1: State change: claiming protocol successful
01/21 11:51:38 slot1: Changing state: Matched -> Claimed
01/21 11:51:38 slot1: Got activate_claim request from shadow
(<128.112.146.182:39537>)
01/21 11:51:38 slot1: Job Requirements check failed!
01/21 11:51:38 slot1: Called deactivate_claim_forcibly()
01/21 11:51:38 slot1: State change: received RELEASE_CLAIM command
01/21 11:51:38 slot1: Changing state and activity: Claimed/Idle ->
Preempting/Vacating
01/21 11:51:38 slot1: State change: No preempting claim, returning to owner
01/21 11:51:38 slot1: Changing state and activity: Preempting/Vacating
-> Owner/Idle
01/21 11:51:38 slot1: State change: IS_OWNER is false
01/21 11:51:38 slot1: Changing state: Owner -> Unclaimed
01/21 11:52:38 slot1: match_info called
01/21 11:52:38 slot1: Received match <128.112.146.182:33269>#1263850672#6835#...
01/21 11:52:38 slot1: State change: match notification protocol successful
01/21 11:52:38 slot1: Changing state: Unclaimed -> Matched
01/21 11:52:38 slot1: Request accepted.
01/21 11:52:38 slot1: Remote owner is fpereira@xxxxxxxxxxxxx
01/21 11:52:38 slot1: State change: claiming protocol successful
01/21 11:52:38 slot1: Changing state: Matched -> Claimed
01/21 11:52:39 slot1: Got activate_claim request from shadow
(<128.112.146.182:33337>)
01/21 11:52:39 slot1: Job Requirements check failed!
01/21 11:52:39 slot1: Called deactivate_claim_forcibly()
01/21 11:52:39 slot1: State change: received RELEASE_CLAIM command
01/21 11:52:39 slot1: Changing state and activity: Claimed/Idle ->
Preempting/Vacating
01/21 11:52:39 slot1: State change: No preempting claim, returning to owner
01/21 11:52:39 slot1: Changing state and activity: Preempting/Vacating
-> Owner/Idle
01/21 11:52:39 slot1: State change: IS_OWNER is false
01/21 11:52:39 slot1: Changing state: Owner -> Unclaimed

condor_q -l

ClusterId = 39
QDate = 1264092628
CompletionDate = 0
Owner = "fpereira"
LocalUserCpu = 0.000000
LocalSysCpu = 0.000000
RemoteUserCpu = 0.000000
RemoteSysCpu = 0.000000
ExitStatus = 0
NumCkpts_RAW = 0
NumCkpts = 0
NumJobStarts = 0
NumRestarts = 0
NumSystemHolds = 0
CommittedTime = 0
TotalSuspensions = 0
CumulativeSuspensionTime = 0
ExitBySignal = FALSE
CondorVersion = "$CondorVersion: 7.4.0 Nov  1 2009 BuildID: 193173 $"
CondorPlatform = "$CondorPlatform: I386-LINUX_RHEL5 $"
RootDir = "/"
Iwd = "/Volumes/Work/CondorTests/MatlabTestDir"
JobUniverse = 5
Cmd = "/Volumes/Work/CondorTests/wrapper.pl"
MinHosts = 1
WantRemoteSyscalls = FALSE
WantCheckpoint = FALSE
RequestCpus = 1
JobPrio = 0
User = "fpereira@xxxxxxxxxxxxx"
NiceUser = FALSE
Environment = " _=/usr/local/bin/condor_submit
QTINC=/usr/lib/qt-3.3/include CVS_RSH=ssh QTLIB=/usr/lib/qt-3.3/lib
PWD=/Volumes/Work/CondorTests SHLVL=1 PS1=\u@\h' '\w' '$' '
LANG=en_US.UTF-8 TERM=xterm-color MAIL=/var/spool/mail/fpereira
LESSOPEN=|/usr/bin/lesspipe.sh' '%s OLDPWD=/Volumes/Work
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
G_BROKEN_FILENAMES=1 QTDIR=/usr/lib/qt-3.3 SHELL=/bin/bash
USER=fpereira PATH=/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/usr/lib/ccache:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/fpereira/bin:/usr/local/bin:/sw/bin:/opt/local/bin:/usr/local/mysql/bin:/Volumes/Work/CondorOSX/bin:/Applications/MATLAB_R2008a/bin:/Applications/AFNI
HISTSIZE=1000 LOGNAME=fpereira HOSTNAME=machine.csbmb.princeton.edu
HOME=/home/fpereira"
JobNotification = 0
WantRemoteIO = TRUE
UserLog = "/Volumes/Work/CondorTests/MatlabTestDir/matlab.$$(Name).log.txt"
CoreSize = 0
KillSig = "SIGTERM"
Rank = 0.000000
In = "/dev/null"
TransferIn = FALSE
Out = "matlab.$$(Name).out.txt"
StreamOut = FALSE
Err = "matlab.$$(Name).err.txt"
StreamErr = FALSE
BufferSize = 524288
BufferBlockSize = 32768
ShouldTransferFiles = "NO"
TransferFiles = "NEVER"
ImageSize_RAW = 2
ImageSize = 2
ExecutableSize_RAW = 2
ExecutableSize = 2
DiskUsage_RAW = 2
DiskUsage = 2
RequestMemory = ceiling(ifThenElse(JobVMMemory =!= UNDEFINED,
JobVMMemory, ImageSize / 1024.000000))
RequestDisk = DiskUsage
Requirements = (((Name == "slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx") &&
((OpSys == "OSX") || (OpSys == "LINUX")))) && (Arch == "INTEL") &&
(Disk >= DiskUsage) && ((Memory * 1024) >= ImageSize) &&
(TARGET.FileSystemDomain == MY.FileSystemDomain)
FileSystemDomain = "princeton.edu"
JobLeaseDuration = 1200
PeriodicHold = FALSE
PeriodicRelease = FALSE
PeriodicRemove = FALSE
OnExitHold = FALSE
OnExitRemove = TRUE
LeaveJobInQueue = FALSE
Args = "$$(OpSys) run.m"
GlobalJobId = "machine.csbmb.princeton.edu#39.0#1264092628"
ProcId = 0
AutoClusterId = 0
AutoClusterAttrs =
"JobUniverse,LastCheckpointPlatform,NumCkpts,DiskUsage,ImageSize,FileSystemDomain,Requirements,NiceUser,ConcurrencyLimits"
JobStartDate = 1264092638
WantMatchDiagnostics = TRUE
LastMatchTime = 1264094139
NumJobMatches = 26
OrigMaxHosts = 1
LastJobLeaseRenewal = 1264094139
StartdPrincipal = "128.112.146.182"
JobLastStartDate = 1264094079
JobCurrentStartDate = 1264094139
NumShadowStarts = 26
JobRunCount = 26
MATCH_EXP_UserLog =
"/Volumes/Work/CondorTests/MatlabTestDir/matlab.slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
MATCH_EXP_Out = "matlab.slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
MATCH_Name = "slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx"
MATCH_EXP_Err = "matlab.slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
MATCH_OpSys = "LINUX"
MATCH_EXP_Args = "LINUX run.m"
LastVacateTime = 1264094139
BytesSent = 0.000000
BytesRecvd = 0.000000
RemoteWallClockTime = 1.000000
LastRemoteHost = "slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxx"
LastPublicClaimId = "<128.112.146.182:33269>#1263850672#6881#..."
LastPublicClaimIds = ""
CurrentHosts = 0
LastJobStatus = 2
JobStatus = 1
EnteredCurrentStatus = 1264094139
LastSuspensionTime = 0
MaxHosts = 1
ServerTime = 1264094154