[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Backfill BOINC jobs with partitionable slots



Alec.  

I think we can configure HTCondor to run BOINC jobs as backfill on machines that use partitionable slots by
using a combination of static backfill slots and HTCondor's workfetch mechanism.

The basic strategy is to use  a BOINC_FETCH_WORK_HOOK to register a script that
the Startd will run regularly,  that script will create job classads for BOINC jobs 
that the Startd will then attempt to run on all of the slots in the Startd. 

We adjust the configuration of the existing partitionable slot so that it will not do BOINC_FETCH_WORK
nor will it match the BOINC_FETCH_WORK jobs.   

We add to the configuration a new set of static slots that *will* match the BOINC_FETCH_WORK jobs.
These slots are named backfill1 - backfillN,  and they share CPUs & Memory the the partitionable slot,
But  those slots will only run jobs when the partitionable slot is not using the corresponding CPUs and
Memory

There is a recipe here
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigureBackfill

The sub section of this page entitled "How to create static backfill slots that defer to a partitionable slot"
was recently added,  It is a modified version of a configuration that we are currently using in the CHTC
pool to do backfill.
 
Hope this helps,
-tj 

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Todd Tannenbaum
Sent: Friday, March 27, 2020 11:52 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>; Alec Sheperd <alec.sheperd@xxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Backfill BOINC jobs with partitionable slots

On 3/26/2020 6:50 PM, Alec Sheperd wrote:
> Hi,
> 
> I've been trying enable BOINC backfill in our local condor cluster and seem to be running into issues triggering the 
> backfill state.
> 
> The local BOINC client configuration seems to be correct, I can start it manually and run jobs without issue, but that 
> the slot never goes from Unclaimed/Idle to Backfill.
> 
> We also use partitionable slots, which I'm curious if that might be part of the issue. 

Hi Alec,

Yes, I think that is definitely an issue.

We think HTCondor's built-in BOINC backfill support works for execute nodes using static slots, but there are 
issues/complications for nodes configured to use partitionable slots.  We expect to sort this out and soon post a HOWTO 
config recipe that folks can cut-n-paste even if they are using partitionable slots. The config recipe will go into the 
manual and onto the htcondor-wiki; we will post the URL to this list and the HTCondor homepage.  We hope to have this 
available in a few days.

best regards and stay safe,
Todd
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/