[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Negotiation with partitionable slots



Hey Mathieu,

I went through the same sort of learning curve a few years ago with partitionable slots and the early introduction of consumption policies. (I think there might be a couple of CP bug fixes out there with my name in the "customer" field.)

When consumption policies are off, you're usually going to be using "claim_partitionable_leftovers" instead. So let's say you have a set of three machines with 4GB of memory, and ten 1GB jobs are being matched. At the outset, the negotiator sees a collection of machine ads as follows, and claims them for assignment of matching jobs:

HostA - 4GB Partitionable
HostB - 4GB Partitionable
HostC - 4GB Partitionable

Job1 matches to HostA, Job2 matches to HostB, and Job3 matches to HostC. Done. You then have:

HostA - 3GB Partitionable
		1GB Dynamic - Job1
HostB - 3GB Part
		1GB Dyn - Job2
HostC - 3GB Part
		1GB Dyn - Job3

When "claim_partitionable_leftovers" is on, the leftover 3GB partitionable slots go to the scheduler with a claim ID, allowing the schedd itself to assign additional slots to remaining jobs without having to consult the negotiator. This allows the machines to load themselves up very quickly with all the work they're capable of supporting. I have some 64-core machines and I discovered that a user tried to distinguish his iterations using a millisecond-scale timestamp after half a dozen jobs fired up in the same millsecond and stepped all over each other.

With CP, you may wind up with something like this:

HostA - 0GB Part
		1GB Dyn - Job1
		1GB Dyn - Job4
		1GB Dyn - Job5
		1GB Dyn - Job6
HostB - 0GB Part
            1GB Dyn - Job2
            1GB Dyn - Job7
            1GB Dyn - Job8
            1GB Dyn - Job9
HostC - 2GB Part
            1GB Dyn - Job3
            1GB Dyn - Job10

And so on down the line. However, as the manual warns, this can introduce some problems when it comes to concurrency limits. I never noticed anything go particularly awry in this regard, but I suppose it might have and I just didn't notice it.

With consumption policy, instead of having the scheduler split up the leftovers without consulting the negotiator, instead the negotiator handles it. And so instead of having the first slot of a long list of machines get the first round of jobs, you can do a depth-first fill of your machines. This can be advantageous if you have NFS-mounted input data since the multiple processes can leverage the disk IO buffer to minimize the amount of network traffic hammering aging NetApp servers. This is one of the reasons I jumped on CP as soon as it was available, and not quite fully debugged.

The tradeoff is an increased load on the negotiator, but given today's hardware, it's not even breaking a sweat for my pools even with over a thousand cores in the mix and certain users submitting tens of thousands of jobs per cluster. I think the manual suggests going over 5,000 might make it unhappy, but then you just load up on the SSDs, fast memory, and a 4.2GHz CPU for your negotiator host and pump up the volume.
	
Having the negotiator handle partitioning also insures that concurrency limits will be correctly managed, because you won't have the four machines above all claiming jobs at the same time in a concurrency-limit race.

So, simplistically speaking, you get something like this:

HostA - 0GB Part
		1GB Dyn - Job1
		1GB Dyn - Job2
		1GB Dyn - Job3
		1GB Dyn - Job4
HostB - 0GB Part
            1GB Dyn - Job5
            1GB Dyn - Job6
            1GB Dyn - Job7
            1GB Dyn - Job8
HostC - 2GB Part
            1GB Dyn - Job9
            1GB Dyn - Job10

This may not be the greatest example, since the total number of active cores and machines wound up the same, but if you had four machines instead of three, HostC and HostD would each be running only one job in this example.


I hope this helps clarify what's going on.

	-Michael Pelletier.



> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Mathieu Bahin
> Sent: Friday, April 28, 2017 8:38 AM
> To: htcondor-users@xxxxxxxxxxx
> Subject: Re: [HTCondor-users] Negotiation with partitionable slots
> 
> I can read in the manual that "This differs from scheduler matchmaking in
> that multiple jobs can match with the partitionable slot during a single
> negotiation cycle." (cf
> http://research.cs.wisc.edu/htcondor//manual/v8.2.7/3_5Policy_Configuratio
> n.html#SECTION004510900000000000000).
> 
> But I don't really know how. Is it the regular behaviour with
> partitionable slots? Because that's not what I note here in my tests...
> I'm not sure to perfectly understand this section in the manual. Is there
> something to do with the "CONSUMPTION_POLICY" to solve my issue?
> Currently its value is False for us.
> 
> Cheers,
> Mathieu