Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] questions about condor_restart

Date: Mon, 22 Jul 2019 11:09:37 -0500
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] questions about condor_restart

On 7/22/19 10:53 AM, Shawn A Kwang wrote:

I have a couple of "best-practices questions" for condor cluster
administration.

Is it safe to run 'condor_restart' (-graceful) on a running condor pool
components? Of course you may ask: what do I mean by 'safe'?

Let me ask this question another way. What happens if I run
condor_restart on a 1) Central manager, 2) Submit node (running schedd),
or 3) Compute node? All while users are actively running jobs.

Shawn:

This is a great question.Â Assuming everything comes back after a restart, a restart of

o) The central manager.Â All running jobs stay running.Â No new matches can be made.Â Schedds can start new jobs running only by using existing matches for the same user.Â condor_status doesn't work while the collector is down.

o) Submit node.Â All running jobs stay running for up to the lease duration.Â If the schedd comes back before the job lease expires, it reconnects to the running jobs and the jobs stay running.Â If the schedd is down for too long, the jobs get preempted and go back to idle.Â The default job lease duration is 20 minutes.

o) Execute machines.Â All running jobs on that execute machine are preempted and killed.Â The schedd will notice the jobs have been preempted, mark them as Idle, and try to restart them again from scratch.

-greg


Thanks in advance.

Sincerely,
Shawn

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

References:
- [HTCondor-users] questions about condor_restart
  - From: Shawn A Kwang

Prev by Date: Re: [HTCondor-users] home directory in docker universe - docker image's home directory not visible from inside condor job
Next by Date: [HTCondor-users] Windows installer not signed?
Previous by thread: [HTCondor-users] questions about condor_restart
Next by thread: [HTCondor-users] Windows installer not signed?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] questions about condor_restart