[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Force all jobs by a user on same execute node



Hi

On 1/5/24 15:17, gagan tiwari wrote:
             ÂYes these jobs will use shared memory on the node and interact with each other. Which is why it's needed to run all of them on the same execute node.

ok.

Then let's assume you have three types of jobs (I'm just guessing here), 1 of type A, 8 of type B and 1 of type C. Obviously, I'm just making up numbers here.

A and B require 1GByte of RAM and 1 core each and jobs of type C required 10 Gbyte of RAM and 4 cores.

Then, I'd suggest to following:

(1) Create a wrapper shell script, starting all jobs, e.g.
the following snippet called wrapper_script.sh - needs to be executable!

--8><----8><----8><----8><----8><----8><----8><--
#!/bin/bash

# start all jobs and put them into the "background"
# just making up command line arguments here
A arg1 arg2 arg3 &
B 1 &
B 2 &
B 3 &
B 4 &
B 5 &
B 6 &
B 7 &
B 8 &
C &

# wait for all jobs to finish
wait
echo "done"
exit 0
--8><----8><----8><----8><----8><----8><----8><--

then the submit file will request resources for the SUM of all jobs, e.g. the submit file could look like this

--8><----8><----8><----8><----8><----8><----8><--
request_cpus = 13
# some head room for RAM
request_memory = 20000

executable = wrapper_script.sh
Queue
--8><----8><----8><----8><----8><----8><----8><--

Obviously, that would be the bare minimum, there is no error handling, no handling of output etc, but hopefully enough to get going.

Condor will then try to find a machine which has enough resources available and start the wrapper script which in turn will take care of the rest.

Does that make sense?

Cheers

Carsten
--
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany, Phone +49 511 762 17185

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature