[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Startd couldn't change state to UnClaimed afterjobfinished



Dear all,

	Nobody replied my question.Is there anybody who can receive my mails?



			

	Thanks.
      	 Zhaokun
			   Beijing Hotsim Technology Co.,Ltd
			   zhaokun@xxxxxxxxxxxxx
          2009-02-09
=======From 2009-02-06 11:50:18 =======

>Dear all,
>
>	I test a vanilla job in this pool,but there is no error.The difference in StartLog,after vanilla job finished,    only "Called deactivate_claim_forcibly()" and successfully  "received RELEASE_CLAIM command";after parallel job finished ,  first "Called deactivate_claim_forcibly()" ,then  "Called deactivate_claim()",and a error occured , "condor_write(): Socket closed when trying to write 56 bytes to <172.16.0.1:59635>, fd is 7".
>
>	How to solve this?
>	
>	 help, thanks.
>	
>	my job
>	
>Executable = /bin/hostname
>Universe=vanilla
>Log = s1.log
>Output = s1.out
>Queue
>
>	
>	StartLog
>	
>	4/3 18:08:41 slot1: Got universe "VANILLA" (5) from request classad
>4/3 18:08:41 slot1: State change: claim-activation protocol successful
>4/3 18:08:41 slot1: Changing activity: Idle -> Busy
>4/3 18:08:42 slot1: Called deactivate_claim_forcibly()
>4/3 18:08:42 Starter pid 9618 exited with status 0
>4/3 18:08:42 slot1: State change: starter exited
>4/3 18:08:42 slot1: Changing activity: Busy -> Idle
>4/3 18:08:42 slot1: State change: received RELEASE_CLAIM command
>4/3 18:08:42 slot1: Changing state and activity: Claimed/Idle -> Preempting/Vacating
>4/3 18:08:42 slot1: State change: No preempting claim, returning to owner
>4/3 18:08:42 slot1: Changing state and activity: Preempting/Vacating -> Owner/Idle
>4/3 18:08:42 slot1: State change: IS_OWNER is false
>4/3 18:08:42 slot1: Changing state: Owner -> Unclaimed
>
>	
>	
>
>
>			
>
>	Thanks.
>      	 Zhaokun
>			   Beijing Hotsim Technology Co.,Ltd
>			   zhaokun@xxxxxxxxxxxxx
>          2009-02-06
>=======From 2009-02-05 15:41:39 =======
>
>>hi all,
>>
>>	I test a job in my test condor pool, 1 computer, after a test job finished ,the Activity of the computer has changed to Idle,but the State is still Claimed. There is an error in StartLog about condor_write error,after 600 seconds Sched send a release command.
>>	Thanks .
>>		
>>my test job
>>
>>Universe=parallel
>>Executable = /bin/hostname
>>Output=h.out.$(NODE)
>>Log = h.log
>>machine_count=1
>>Queue
>>
>>	
>>SchedLog
>>		
>>4/3 06:02:56 (pid:7396) Called reschedule_negotiator()
>>4/3 06:03:01 (pid:7396) Sent ad to central manager for zhaokun@xxxxxxxxxxxx
>>4/3 06:03:01 (pid:7396) Sent ad to 1 collectors for zhaokun@xxxxxxxxxxxx
>>4/3 06:03:01 (pid:7396) Inserting new attribute Scheduler into non-active cluster cid=29 acid=-1
>>4/3 06:03:11 (pid:7396) Negotiating for owner: DedicatedScheduler@xxxxxxxxxxxxxxxxx
>>4/3 06:03:11 (pid:7396) Out of requests - 1 reqs matched, 0 reqs idle
>>4/3 06:03:11 (pid:7396) Sent REQUEST_CLAIM to startd mgt1.hotsim.local <172.16.0.1:56568> for DedicatedScheduler
>>4/3 06:03:11 (pid:7396) Inserting new attribute Scheduler into non-active cluster cid=29 acid=-1
>>4/3 06:03:11 (pid:7396) Starting add_shadow_birthdate(29.0)
>>4/3 06:03:11 (pid:7396) Started shadow for job 29.0 on mgt1.hotsim.local <172.16.0.1:56568> for DedicatedScheduler, (shadow pid = 7839)
>>4/3 06:03:13 (pid:7396) In DedicatedScheduler::reaper pid 7839 has status 25600
>>4/3 06:03:13 (pid:7396) Shadow pid 7839 exited with status 100
>>4/3 06:03:13 (pid:7396) DedicatedScheduler::deallocMatchRec
>>4/3 06:03:13 (pid:7396) DedicatedScheduler::deallocMatchRec
>>4/3 06:03:31 (pid:7396) Sent owner (0 jobs) ad to 1 collectors
>>4/3 06:13:13 (pid:7396) Resource mgt1.hotsim.local has been unused for 600 seconds, limit is 600, releasing
>>
>>StartLog
>>	
>>4/3 06:03:11 match_info called
>>4/3 06:03:11 Received match <172.16.0.1:56568>#1175551221#1#...
>>4/3 06:03:11 State change: match notification protocol successful
>>4/3 06:03:11 Changing state: Unclaimed -> Matched
>>4/3 06:03:11 Request accepted.
>>4/3 06:03:11 Remote owner is DedicatedScheduler@xxxxxxxxxxxxxxxxx
>>4/3 06:03:11 State change: claiming protocol successful
>>4/3 06:03:11 Changing state: Matched -> Claimed
>>4/3 06:03:12 Got activate_claim request from shadow (<172.16.0.1:55514>)
>>4/3 06:03:12 Remote job ID is 29.0
>>4/3 06:03:13 Got universe "PARALLEL" (11) from request classad
>>4/3 06:03:13 State change: claim-activation protocol successful
>>4/3 06:03:13 Changing activity: Idle -> Busy
>>4/3 06:03:13 Called deactivate_claim_forcibly()
>>4/3 06:03:13 Starter pid 7844 exited with status 0
>>4/3 06:03:13 State change: starter exited
>>4/3 06:03:13 Changing activity: Busy -> Idle
>>4/3 06:03:13 Called deactivate_claim()
>>4/3 06:03:13 condor_write(): Socket closed when trying to write 56 bytes to <172.16.0.1:59635>, fd is 7
>>4/3 06:03:13 Buf::write(): condor_write() failed
>>4/3 06:13:13 State change: received RELEASE_CLAIM command
>>4/3 06:13:13 Changing state and activity: Claimed/Idle -> Preempting/Vacating
>>4/3 06:13:13 State change: No preempting claim, returning to owner
>>4/3 06:13:13 Changing state and activity: Preempting/Vacating -> Owner/Idle
>>4/3 06:13:13 State change: IS_OWNER is false
>>4/3 06:13:13 Changing state: Owner -> Unclaimed
>>
>>
>>	Thanks.
>>      	 Zhaokun
>>			   Beijing Hotsim Technology Co.,Ltd
>>			   zhaokun@xxxxxxxxxxxxx
>>          2009-02-05
>>_______________________________________________
>>Condor-users mailing list
>>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>subject: Unsubscribe
>>You can also unsubscribe by visiting
>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>>The archives can be found at: 
>>https://lists.cs.wisc.edu/archive/condor-users/
>
>= = = = = = = = = = = = = = = = = = = =
>_______________________________________________
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>subject: Unsubscribe
>You can also unsubscribe by visiting
>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>The archives can be found at: 
>https://lists.cs.wisc.edu/archive/condor-users/

= = = = = = = = = = = = = = = = = = = =