Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Force all jobs by a user on same execute node

Date: Fri, 5 Jan 2024 15:40:27 +0100
From: Carsten Aulbert <carsten.aulbert@xxxxxxxxxx>
Subject: Re: [HTCondor-users] Force all jobs by a user on same execute node

Hi

On 1/5/24 15:17, gagan tiwari wrote:

Â Â Â Â Â Â Â Â Â Â Â Â Â ÂYes these jobs will use shared memory on thenode and interact with each other. Which is why it's needed to run allof them on the same execute node.

ok.

Then let's assume you have three types of jobs (I'm just guessing here),1 of type A, 8 of type B and 1 of type C. Obviously, I'm just making upnumbers here.

A and B require 1GByte of RAM and 1 core each and jobs of type Crequired 10 Gbyte of RAM and 4 cores.


Then, I'd suggest to following:

(1) Create a wrapper shell script, starting all jobs, e.g.
the following snippet called wrapper_script.sh - needs to be executable!

--8><----8><----8><----8><----8><----8><----8><--
#!/bin/bash

# start all jobs and put them into the "background"
# just making up command line arguments here
A arg1 arg2 arg3 &
B 1 &
B 2 &
B 3 &
B 4 &
B 5 &
B 6 &
B 7 &
B 8 &
C &

# wait for all jobs to finish
wait
echo "done"
exit 0
--8><----8><----8><----8><----8><----8><----8><--

then the submit file will request resources for the SUM of all jobs,e.g. the submit file could look like this


--8><----8><----8><----8><----8><----8><----8><--
request_cpus = 13
# some head room for RAM
request_memory = 20000

executable = wrapper_script.sh
Queue
--8><----8><----8><----8><----8><----8><----8><--

Obviously, that would be the bare minimum, there is no error handling,no handling of output etc, but hopefully enough to get going.

Condor will then try to find a machine which has enough resourcesavailable and start the wrapper script which in turn will take care ofthe rest.


Does that make sense?

Cheers

Carsten
--
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany, Phone +49 511 762 17185

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Follow-Ups:
- Re: [HTCondor-users] [External] Re: Force all jobs by a user on same execute node
  - From: Pelletier, Michael V. RTX

References:
- [HTCondor-users] Force all jobs by a user on same execute node
  - From: gagan tiwari
- Re: [HTCondor-users] Force all jobs by a user on same execute node
  - From: Carsten Aulbert
- Re: [HTCondor-users] Force all jobs by a user on same execute node
  - From: gagan tiwari
- Re: [HTCondor-users] Force all jobs by a user on same execute node
  - From: Carsten Aulbert
- Re: [HTCondor-users] Force all jobs by a user on same execute node
  - From: gagan tiwari

Prev by Date: Re: [HTCondor-users] Force all jobs by a user on same execute node
Next by Date: Re: [HTCondor-users] audit for idle/held jobs (system management)
Previous by thread: Re: [HTCondor-users] Force all jobs by a user on same execute node
Next by thread: Re: [HTCondor-users] [External] Re: Force all jobs by a user on same execute node
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Force all jobs by a user on same execute node