Re: [HTCondor-users] Multiple HTCondor workers on a single compute node

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Date: Mon, 6 Nov 2023 23:59:13 +0000

From: "Daues, Gregory Edward" <daues@xxxxxxxxxxxx>

Subject: Re: [HTCondor-users] Multiple HTCondor workers on a single compute node

Hello,

I use the -n option with a random number in a bash script like

#!/bin/bash

export VERY_RNUM=$RANDOM

${_CONDOR_SBIN}/condor_master -f -n compute_condor_${VERY_RNUM}

but I imagine there could be other ways.

Greg

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Seung-Jin Sul <ssul@xxxxxxx>
Sent: Monday, November 6, 2023 5:21 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Multiple HTCondor workers on a single compute node

Hi,

We use SLURM as a glide-in backend and sometimes need to run multiple HTCondor worker services on the same node. This happens when we request a part of a compute node like 1 CPU and 10GB memory from SLURM.

When we try to start another instance of HTCondor on the same node, we see below

```

11/06/23 14:49:54 lock_file returning ERROR, errno=11 (Resource temporarily unavailable)
11/06/23 14:49:54 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable)
11/06/23 14:49:54 ERROR "Can't get lock on "/clusterfs/jgi/scratch/dsi/aa/jaws/dori-dev/htcondor-log/n0099/log/InstanceLock"" at line 1691 in fil e /var/lib/condor/execute/slot1/dir_3620933/userdir/.tmpdnieob/BUILD/condor-10.2.2/src/condor_master.V6/master.cpp

```

How can we start multiple HTcondor worker services on a node? Any info on setting the port and on the lock file will be helpful.

Thank you!

Best,

Seung

Mailing List Archives

Public Access

Re: [HTCondor-users] Multiple HTCondor workers on a single compute node