[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] control law questions



> From: "Krieger, Donald N." <kriegerd@xxxxxxxx>
> Date: 12/04/2015 06:41 AM
>
> Dear List:

>  
> I have two questions about the algorithm which is used to move jobs from
> the âIâ state in the run state.

>  
> Does the algorithm take into account the amount of time requested?  
> For instance do sites specify up front the maximum amount of time that a
> job they will accept can request?  

> If so, can a user get a survey of those times from all the sites currently
> accepting jobs?

>  
> Below is a 24-hour plot showing the number of jobs running through the xd-
> login submit host on the Open Science Grid.

> During this period most of the opportunistic cycles were shared relatively
> equally between 3 users, all running through xd-login.

> The black tracing is the total.  It is a count of the number condor_shadowprocesses.
> The blue tracing is the number of my running jobs.  It is obtained from a
> condor_q command.

> There is an oscillation in the blue tracing with a period of about 90
> minutes which is quite large.

> I presume that the other users saw a comparable oscillation and I have
> seen this behavior repeatedly.

> Is there something out there which analyzes the behavior of the control
> algorithm implemented in HTCondor?

> I have reviewed the documentation on the algorithm itself and admit that I
> do not understand it.

> Any comments on this would be welcome.


Hey Don,

My experience may not be directly relevant to this situation, but when I
started adding opportunistic resources to one of my pools in the form of
desktop systems running Linux, I noticed a key difference between them
and the dedicated (i.e., "START=True") resources.

Checking the OS statistics graphs, I saw the load average of the
opportunistic machines oscillating from 0 to 12 and back again, with similar
ripples in the network traffic, all night and all weekend.

Since the desktops generally don't have the spiffy network offload features
of my dedicated machines, when things started cranking in earnest on them
the non-HTCondor load average promptly rose to around 1, as a dozen jobs on
a high-end desktop workstation went about their business fetching inputs,
consulting remote data, and writing outputs. The kernel was having to
segment/fragment/checksum a much higher level of network traffic than
usual which led to longer delays in the kernel's scheduler for non-HTCondor
processes and a rising load average.

With the default configuration, a non-HTCondor load average higher than
0.3 puts the machine back in Owner state causing it to stop accepting
additional jobs, and with the minimum idle time constraint it stayed
there for 15 minutes or more.

As a result, we saw the same kind of oscillation in the graphs, as
desktop workstations flipped in and out of Owner/Idle state. A desktop
would accept 6, 8, or 12 jobs, go into owner due to 1.0 > 0.3,
drain out most or all of the jobs, then finally go back into Unclaimed
15 or more minutes after the loadav dropped below 0.3 to repeat
the cycle again; instead of finishing one of the 12 jobs and starting
another to keep 12 jobs running all night and all weekend.

The problem was even more visible when the machines were running
short-duration jobs, where the time HTCondor must wait before moving
from Owner to Unclaimed is significantly longer than the job duration.

First, I tried increasing the non-HTCondor load average limit from 0.3
to 1.0, and that helped quite a bit, but it's not a complete solution
if you want to keep machine owners happy. It needs to be coupled with
some expressions for more aggressive suspension and eviction of jobs based
on the keyboard and mouse activity - and you need to be sure that kbdd
is working correctly with respect to both KeyboardIdle and ConsoleIdle.

In addition, I reduced the number of CPUs advertised by desktop systems
by one, to leave some room for network processing in the kernel rather than
the interface. That could be coupled to a probe of the interface using
"ethtool --show-offload," come to think of it.

Good luck!

        -Michael Pelletier.