[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] how to troubleshoot the scheduling process



How about something like a minutely cron job that dynamically updates parallel job priorities to reflect something like EUP priorities (properly scaled and inverted).

I.e.:

1) Find all EUP's for all users in the queue, and scale to a range (-1000, 1000), with 1000 highest priority (corresponding to EUP = 0.5), and -1000 to the max current EUP.

2) Find all parallel jobs.

3) update all parallel job's priories to reflect the new scaling, user condor_prio -p [N] username.

It looks like condor_prio doesn't accept class ad expressions like condor_status does, so it'll take a little more parsing with the condor_q -l to get it right.

Does this sound reasonable?

rob


On Feb 7, 2008, at 6:05 PM, Dan Bradley wrote:

I can't think of any way to make the parallel scheduler pay attention to
EUPs.

Enhancing the parallel scheduler is one of the items we hope to address
in the 7.1 development cycle.

--Dan

Robert E. Parrott wrote:
Is there a way to make the scheduler pay attention to EUPs?


On Feb 7, 2008, at 5:35 PM, Dan Bradley wrote:


Rob,

The parallel scheduler in condor does not pay any attention to user
priorities. By default, it schedules things in the order specified by the priority of the job (i.e. the priority assigned to the job in the
submit file by the user).  Unlike other job universes, the parallel
universe applies this priority across jobs from all users, so it is
expected that the users will coordinate the setting of these priority
values.  If priorities are equal, then jobs are run in
first-in-first-out order.  You can select an alternate best-fit
algorithm as documented here:

http://www.cs.wisc.edu/condor/manual/ v7.0/3_3Configuration.html#14115

--Dan

Robert E. Parrott wrote:

As a followup to this, I've enabled verbose negotiator logging, and
see the following behavior.

Users do seem to come up for negotiation in EUP order, as expected.
However, the data about user jobs seem to be incorrect.

When a user with EUP 0.5 (the lowest) and a parallel job in the idle
state comes up for negotiation, the message I'm seeing is

"Negotiating with [user]@seas.harvard.edu skipped because no idle
jobs."

Thus there's something amiss here with the info the schedd.

Any thoughts on this? The problem seems to not be present for serial
or other jobs, just parallel universe jobs.

thanks,
rob


On Feb 7, 2008, at 4:23 PM, Robert E. Parrott wrote:



HI Folks,

We have a situation where a user with very high EUP, and a large
number of jobs in the queue, is always scheduled ahead of users with
much lower (100 times or more) EUP, and thus much high priority.
All
these jobs are parallel (MPI) jobs, which is likely relevant.

To begin, can anyone suggest a method to diagnose the problem here,
and how these evaluations are taking place. My understand from the
manual is that user jobs are considered in order of priority (from
lower EUP to highest).  But the opposite seems to be occurring.

As an example, this user, using 156/200 resources, has a 12 process parallel job complete. His EUP is 156. Immediately a new 12 process job of his is started, despite the fact that there's a user with EUP
0.5 and an 8 node job waiting in the queue.

Thank for any initial insight or input in how to address this.

rob


==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Project Manager., CrimsonGrid Initiative and
Program Manager, CyberInfrastructure Lab
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Project Manager., CrimsonGrid Initiative and
Program Manager, CyberInfrastructure Lab
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Project Manager., CrimsonGrid Initiative and
Program Manager, CyberInfrastructure Lab
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Project Manager., CrimsonGrid Initiative and
Program Manager, CyberInfrastructure Lab
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045