[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Remove HTCondor worker node from a central Master



The condor_off command needs to go to a condor_master daemon,  so

   condor_off  svc-jaws@xxxxxxxxxx

or 

   condor_off  -master svc-jaws@xxxxxxxxxx

should work.  The first commands turns off all of the daemons other than the condor_master.   The second command turns off all daemons, including the condor_master.

If that doesn't work,   try

   condor_off  -debug svc-jaws@xxxxxxxxxx

The -debug option will help to show why the condor_off command is not working.

-tj


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Seung-Jin Sul <ssul@xxxxxxx>
Sent: Thursday, November 2, 2023 5:12 PM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Remove HTCondor worker node from a central Master
 
Hi, 
We are setting HTCondor using a glide-in way with SLURM. I was wondering if there is any way I can remove the HTCondor worker processes running on a SLURM compute node. I've been testing `condor_off -name <machine_name>` and `condor_off -addr <IP:port>` but those are not successful so far. 

For example, we have a worker node like below 
```
$ condor_status -any
MyType             TargetType         Name

Collector          None               My Pool - ln010.xxx@xxxxxxxxx
Scheduler          None               svc-jaws@xxxxxxxxx
DaemonMaster       None               svc-jaws@xxxxxxxxx
Negotiator         None               svc-jaws@xxxxxxxxx
Machine            Job                slot1@xxxxxxxxxx
DaemonMaster       None               svc-jaws@xxxxxxxxxx
Accounting         none               <none>
```
 
And then I would like to call a command from the central to 
- terminate HTCondor services on n0040.yyy0
- clean up ` slot1@xxxxxxxxxx` from the Master's machine list
- terminate the SLURM job


Any help will be appreciated.

Thank you.

Best, 
Seung