[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] does condor_off -peaceful -daemon startd node; works for vanilla jobs?



As another data point, it also seemed to work for me running a 
pre-release of HTCondor v8.5.7 on Scientific Linux 6.8.
Behold the simple test below; note the node went from Claimed/Busy to 
Claimed/Retiring, which is expected.  "Retiring" activity is defined in
the Manual (from https://is.gd/mi7mVk ):

  Retiring
   When an active claim is about to be preempted for any reason, it enters retirement, 
   while it waits for the current job to finish. The MaxJobRetirementTime expression determines 
   how long to wait (counting since the time the job started). Once the job finishes or the 
   retirement time expires, the Preempting state is entered. 

Perhaps you have a MaxJobRetirementTime defined ?

regards,
Todd

[tannenba@localhost test]$ condor_status
Name            OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@localhost LINUX      X86_64 Claimed   Busy      0.000  330  0+00:00:04
slot2@localhost LINUX      X86_64 Unclaimed Idle      0.000  330  0+00:00:05
slot3@localhost LINUX      X86_64 Unclaimed Idle      0.000  330  0+00:00:06

                     Total Owner Claimed Unclaimed Matched Preempting Backfill  Drain

        X86_64/LINUX     3     0       1         2       0          0        0      0

               Total     3     0       1         2       0          0        0      0

[tannenba@localhost test]$ condor_off -peaceful -daemon startd
Sent "Set-Peaceful-Shutdown" command to local startd
Sent "Kill-Daemon-Peacefully" command to local master

[tannenba@localhost test]$ condor_status
Name            OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@localhost LINUX      X86_64 Claimed   Retiring  0.000  330  0+00:00:03
slot2@localhost LINUX      X86_64 Unclaimed Idle      0.000  330  0+00:02:49
slot3@localhost LINUX      X86_64 Unclaimed Idle      0.000  330  0+00:00:06

                     Total Owner Claimed Unclaimed Matched Preempting Backfill  Drain

        X86_64/LINUX     3     0       1         2       0          0        0      0

               Total     3     0       1         2       0          0        0      0



On 8/18/2016 11:11 AM, Bob Ball wrote:
> Just as a data point, I do, from our central manager machine,
> condor_off -peaceful -daemon startd -name $publicName
> and it runs just fine.  All our jobs are vanilla.  HTCondor is version 
> 8.4.6 on Scientific Linux.
> 
> bob
> 
> On 8/18/2016 11:54 AM, Harald van Pee wrote:
>>
>> Hi,
>>
>> I want to set a job running node offline, but only after all running 
>> jobs have finished. Of course until then no new jobs should be 
>> accepted on that node.
>>
>> I tried the command:
>>
>> condor_off -peaceful -daemon startd node
>>
>> and got the message:
>>
>> Sent "Set-Peaceful-Shutdown" command to startd node
>>
>> Sent "Kill-Daemon-Peacefully" command to master node
>>
>> On node I see in StartLog
>>
>> 08/18/16 17:20:49 Got SIGTERM. Performing graceful shutdown.
>>
>> 08/18/16 17:20:49 shutdown graceful
>>
>> And indeed all jobs running in vannilla universe (we have no others)
>>
>> are killed directly and started from the beginning. This is what a
>>
>> graceful shutdown will do with vanilla jobs. But I want to have a 
>> peaceful shutdown.
>>
>> Is a peaceful shutdown not possible for vanilla jobs?
>>
>> Do I have to change the configuration? We use:
>>
>> PREEMPT = FALSE
>>
>> PREEMPTION_REQUIREMENTS = False
>>
>> KILL = FALSE
>>
>> WANT_SUSPEND = false
>>
>> WANT_VACATE = false
>>
>> Or can I use just a different command?
>>
>> We use condor 8.4.8 on debian 8 (AMD64).
>>
>> Thanks
>>
>> Harald
>>
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message tohtcondor-users-request@xxxxxxxxxxx  with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 


-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685