[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] BOINC jobs on compute nodes (with GPU)



Hi all

(if the introductory part is too long, please just skip to the part
marked by ####)

we are just about to redo our grown HTCondor config and are revisiting
running HTCondor and BOINC again.

At the moment I think the only supported way of running BOINC under
HTCondor's umbrella is a backfill, which will only ever kick in, if the
node as an idle slot/core (static model) or is completely idle (i.e. if
the node is configured to be fully partition-able).

Given that we have a number of multi-core jobs, I think going back to a
static layout is a no-go as is waiting until all cores of a given system
are idle.

This currently leaves only two alternatives I can think of.

(1) A special user submits condor jobs into the pool with a very bad
priority and the pool is configured to evict these jobs as soon as
needed. Within this framework, I think it should also be possible to
submit GPU and CPU jobs in parallel.

However, managing this centrally with proper copying of files is
potentially a nightmare even if HTCondor will do that for us as one
would need to ensure proper locking and so on.

(2) The easier approach - which is what we are currently using - is
starting the BOINC client on the system independent of HTCondor but
limit it via cgrgoups to 1/1024 of a core so it will only ever get
cycles whenever there are idle cycles.

This approach works surprisingly well, however, as this is outside of
HTCondor, I don't dare to occupy the GPU as I would have no idea, when
condor_startd would start a job on it.

Thus my question:

####

Is there a hook within HTCondor's startd on nodes partitionable slots
which we could use to launch a script/interact with BOINC via
API/boincmd to stop and start a GPU job whenever the resource is unused
by HTCondor?

Cheers

Carsten


-- 
Dr. Carsten Aulbert, Atlas cluster administration
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
CallinstraÃe 38, 30167 Hannover, Germany
Tel: +49 511 762 17185, Fax: +49 511 762 17193