Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Startd couldn't change state to UnClaimed afterjobfinished
- Date: Mon, 9 Feb 2009 10:44:41 +0800
- From: "zhaokun" <zhaokun@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Startd couldn't change state to UnClaimed afterjobfinished
Dear all,
Nobody replied my question.Is there anybody who can receive my mails?
Thanks.
Zhaokun
Beijing Hotsim Technology Co.,Ltd
zhaokun@xxxxxxxxxxxxx
2009-02-09
=======From 2009-02-06 11:50:18 =======
>Dear all,
>
> I test a vanilla job in this pool,but there is no error.The difference in StartLog,after vanilla job finished, only "Called deactivate_claim_forcibly()" and successfully "received RELEASE_CLAIM command";after parallel job finished , first "Called deactivate_claim_forcibly()" ,then "Called deactivate_claim()",and a error occured , "condor_write(): Socket closed when trying to write 56 bytes to <172.16.0.1:59635>, fd is 7".
>
> How to solve this?
>
> help, thanks.
>
> my job
>
>Executable = /bin/hostname
>Universe=vanilla
>Log = s1.log
>Output = s1.out
>Queue
>
>
> StartLog
>
> 4/3 18:08:41 slot1: Got universe "VANILLA" (5) from request classad
>4/3 18:08:41 slot1: State change: claim-activation protocol successful
>4/3 18:08:41 slot1: Changing activity: Idle -> Busy
>4/3 18:08:42 slot1: Called deactivate_claim_forcibly()
>4/3 18:08:42 Starter pid 9618 exited with status 0
>4/3 18:08:42 slot1: State change: starter exited
>4/3 18:08:42 slot1: Changing activity: Busy -> Idle
>4/3 18:08:42 slot1: State change: received RELEASE_CLAIM command
>4/3 18:08:42 slot1: Changing state and activity: Claimed/Idle -> Preempting/Vacating
>4/3 18:08:42 slot1: State change: No preempting claim, returning to owner
>4/3 18:08:42 slot1: Changing state and activity: Preempting/Vacating -> Owner/Idle
>4/3 18:08:42 slot1: State change: IS_OWNER is false
>4/3 18:08:42 slot1: Changing state: Owner -> Unclaimed
>
>
>
>
>
>
>
> Thanks.
> Zhaokun
> Beijing Hotsim Technology Co.,Ltd
> zhaokun@xxxxxxxxxxxxx
> 2009-02-06
>=======From 2009-02-05 15:41:39 =======
>
>>hi all,
>>
>> I test a job in my test condor pool, 1 computer, after a test job finished ,the Activity of the computer has changed to Idle,but the State is still Claimed. There is an error in StartLog about condor_write error,after 600 seconds Sched send a release command.
>> Thanks .
>>
>>my test job
>>
>>Universe=parallel
>>Executable = /bin/hostname
>>Output=h.out.$(NODE)
>>Log = h.log
>>machine_count=1
>>Queue
>>
>>
>>SchedLog
>>
>>4/3 06:02:56 (pid:7396) Called reschedule_negotiator()
>>4/3 06:03:01 (pid:7396) Sent ad to central manager for zhaokun@xxxxxxxxxxxx
>>4/3 06:03:01 (pid:7396) Sent ad to 1 collectors for zhaokun@xxxxxxxxxxxx
>>4/3 06:03:01 (pid:7396) Inserting new attribute Scheduler into non-active cluster cid=29 acid=-1
>>4/3 06:03:11 (pid:7396) Negotiating for owner: DedicatedScheduler@xxxxxxxxxxxxxxxxx
>>4/3 06:03:11 (pid:7396) Out of requests - 1 reqs matched, 0 reqs idle
>>4/3 06:03:11 (pid:7396) Sent REQUEST_CLAIM to startd mgt1.hotsim.local <172.16.0.1:56568> for DedicatedScheduler
>>4/3 06:03:11 (pid:7396) Inserting new attribute Scheduler into non-active cluster cid=29 acid=-1
>>4/3 06:03:11 (pid:7396) Starting add_shadow_birthdate(29.0)
>>4/3 06:03:11 (pid:7396) Started shadow for job 29.0 on mgt1.hotsim.local <172.16.0.1:56568> for DedicatedScheduler, (shadow pid = 7839)
>>4/3 06:03:13 (pid:7396) In DedicatedScheduler::reaper pid 7839 has status 25600
>>4/3 06:03:13 (pid:7396) Shadow pid 7839 exited with status 100
>>4/3 06:03:13 (pid:7396) DedicatedScheduler::deallocMatchRec
>>4/3 06:03:13 (pid:7396) DedicatedScheduler::deallocMatchRec
>>4/3 06:03:31 (pid:7396) Sent owner (0 jobs) ad to 1 collectors
>>4/3 06:13:13 (pid:7396) Resource mgt1.hotsim.local has been unused for 600 seconds, limit is 600, releasing
>>
>>StartLog
>>
>>4/3 06:03:11 match_info called
>>4/3 06:03:11 Received match <172.16.0.1:56568>#1175551221#1#...
>>4/3 06:03:11 State change: match notification protocol successful
>>4/3 06:03:11 Changing state: Unclaimed -> Matched
>>4/3 06:03:11 Request accepted.
>>4/3 06:03:11 Remote owner is DedicatedScheduler@xxxxxxxxxxxxxxxxx
>>4/3 06:03:11 State change: claiming protocol successful
>>4/3 06:03:11 Changing state: Matched -> Claimed
>>4/3 06:03:12 Got activate_claim request from shadow (<172.16.0.1:55514>)
>>4/3 06:03:12 Remote job ID is 29.0
>>4/3 06:03:13 Got universe "PARALLEL" (11) from request classad
>>4/3 06:03:13 State change: claim-activation protocol successful
>>4/3 06:03:13 Changing activity: Idle -> Busy
>>4/3 06:03:13 Called deactivate_claim_forcibly()
>>4/3 06:03:13 Starter pid 7844 exited with status 0
>>4/3 06:03:13 State change: starter exited
>>4/3 06:03:13 Changing activity: Busy -> Idle
>>4/3 06:03:13 Called deactivate_claim()
>>4/3 06:03:13 condor_write(): Socket closed when trying to write 56 bytes to <172.16.0.1:59635>, fd is 7
>>4/3 06:03:13 Buf::write(): condor_write() failed
>>4/3 06:13:13 State change: received RELEASE_CLAIM command
>>4/3 06:13:13 Changing state and activity: Claimed/Idle -> Preempting/Vacating
>>4/3 06:13:13 State change: No preempting claim, returning to owner
>>4/3 06:13:13 Changing state and activity: Preempting/Vacating -> Owner/Idle
>>4/3 06:13:13 State change: IS_OWNER is false
>>4/3 06:13:13 Changing state: Owner -> Unclaimed
>>
>>
>> Thanks.
>> Zhaokun
>> Beijing Hotsim Technology Co.,Ltd
>> zhaokun@xxxxxxxxxxxxx
>> 2009-02-05
>>_______________________________________________
>>Condor-users mailing list
>>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>subject: Unsubscribe
>>You can also unsubscribe by visiting
>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>>The archives can be found at:
>>https://lists.cs.wisc.edu/archive/condor-users/
>
>= = = = = = = = = = = = = = = = = = = =
>_______________________________________________
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>subject: Unsubscribe
>You can also unsubscribe by visiting
>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>The archives can be found at:
>https://lists.cs.wisc.edu/archive/condor-users/
= = = = = = = = = = = = = = = = = = = =