Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Getting HTCondor to make space for multi-host parallel jobs.

Date: Tue, 05 May 2020 13:33:25 +0000
From: Paul Hopkins <HopkinsP@xxxxxxxxxxxxx>
Subject: [HTCondor-users] Getting HTCondor to make space for multi-host parallel jobs.

Does anybody have any experience with running multi-host parallel jobs on an already full HTCondor cluster, and getting HTCondor to automatically preempt or drain the number of nodes required to run the jobs? When users submit multi-host parallel jobs they will remain idle until I manually make space by draining nodes.

The cluster is already configured with ALLOW_PSLOT_PREEMPTION = True, but I don't think that is able to preempt jobs from multiple nodes. The HTCondor defrag daemon has been suggested, but I don't see how to configure it to drain the exact number of nodes to fit the parallel jobs, nor how to synchronise the node release so that all of they are made available at the same time. If they are not released together then the just get filled by small jobs again.

My current workaround is with a Python script that periodically checks for idle parallel jobs, drains the nodes required to run them, and then releases the nodes when they are all ready. However, I would prefer a native HTCondor solution if possible.

Many thanks,

Paul

Paul Hopkins
Computational Infrastructure Scientist

Cardiff University | Prifysgol Caerdydd

+44 (0) 29 225 10043

Prev by Date: Re: [HTCondor-users] convert from 'date +"%H-%M %m-%d-%Y"' to seconds from 1970
Next by Date: Re: [HTCondor-users] convert from 'date +"%H-%M %m-%d-%Y"' to seconds from 1970
Previous by thread: Re: [HTCondor-users] GPU monitoring vanished in my pool :(
Next by thread: [HTCondor-users] Reconstructing Condor History
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[HTCondor-users] Getting HTCondor to make space for multi-host parallel jobs.