[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Ghost machine list from `condor_status -any`



Hi,Â

We are using HTCondor with the SLURM backend and recentlyÂwe've seen deallocated SLURM nodes shown up in the list from the `condor_status -any` command like the one below.


```
$ condor_status -any
MyType       TargetType     Name

Collector     ÂNone        My Pool - ln010@ln010
Submitter     ÂNone        condor_pool@svc
Scheduler     ÂNone        svc@ln010
DaemonMaster    None        svc@ln010
Negotiator     None        svc@ln010
Machine      ÂJob        Âslot1@n0013
DaemonMaster    None        svc@n0013
Machine      ÂJob        Âslot1@n0004
DaemonMaster    None        svc@n0004

Accounting     none        <none>
Accounting     none        condor_pool@svc
```


The `n0013` and `n0004` should have been allocatedÂand used as htcondor worker nodes before but it's deallocated already.
Also, we know the `n0013` and `n0004` will be cleared up eventually but We are wondering if there is a better way to handle this case like cleaning up the list more correctly.

We are starting a HTCondor worker with a SLURM script like the below.

```
#!/bin/bash
#SBATCH -t 72:00:00
#SBATCH --exclusive

# Run condor in forward mode
condor_master -f
```

Any comment will be appreciated.Â


Best,Â
Seung Sul