[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Question about scheduling policy



Hello again,

Apologies for that. I just realized I pressed the send button before
writing a subject to my question email. Subject-less mails are not
nice. Sorry.

thanks for your help
Gonzalo

On 5 May 2014 17:00, Gonzalo Merino <gonzalo.merino@xxxxxxxxxxxxxxxx> wrote:
> Hello,
>
> We have a scheduling policy use case at our site which it is not clear
> to me how to best implement within htcondor. I wanted to ask for help
> to the experts out there.
>
> We have a ~3000 slots batch farm. The default maximum duration for the
> jobs is 12h, we kill jobs exceeding that limit. Most of the jobs of
> our users take way less time, typically finishing within 2 or 3 hours
> maximum. However, some times there are special jobs that might last
> longer than 12h. For that purpose we have currently implemented a
> "long" AccountingGroup to which users can submit and that will allow
> jobs to run up to 48h. We have configured a maximum of slots that can
> be running in this "long" accounting group equal to 400. This is
> currently a hard limit.
>
> What we observe is that, even if they are submitted to the "long"
> AccountingGroup, still 90% of the jobs complete in much less than 12h.
> Users submit them to the long queue "just in case" to make sure the
> few jobs in the tail which might exceed the 12h limit are not killed.
>
> The policy that we would like to implement is one that preempts jobs
> which have been running for more than 12h and which exceed in number
> the max we have configured for this type (400 in our current case).
>
> I try to write an example since I am not sure my english description
> of the policy was all that clear: imagine we start with a full farm,
> running 3000 jobs. There are 700 of those jobs which have been running
> for >12h. I then submit my 100 jobs, and will expect then htcondor to
> choose 100 out of those "long running" 700 jobs to be preempted for
> letting my jobs run. Ideally, I would like to tell htcondor to start
> preempting those jobs which have been running for shorter time for
> instance.
>
> which is the best way to do this in htcondor?
>
> thanks much,
> Gonzalo