[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Multiple HTCondor workers on a single compute node



Hi,Â

We use SLURM as a glide-in backend and sometimes need to run multiple HTCondor worker services on the same node. This happens when we request a part of a compute node like 1 CPU and 10GB memory from SLURM.Â

When we try to start another instance of HTCondor on the same node, we see below

```
11/06/23 14:49:54 lock_file returning ERROR, errno=11 (Resource temporarily unavailable)
11/06/23 14:49:54 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable)
11/06/23 14:49:54 ERROR "Can't get lock on "/clusterfs/jgi/scratch/dsi/aa/jaws/dori-dev/htcondor-log/n0099/log/InstanceLock"" at line 1691 in fil  Âe /var/lib/condor/execute/slot1/dir_3620933/userdir/.tmpdnieob/BUILD/condor-10.2.2/src/condor_master.V6/master.cpp

```


How can we start multiple HTcondor worker services on a node? Any info on setting the port and on the lock file will be helpful.Â

Thank you!

Best,Â
Seung