[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] how to troubleshoot the scheduling process



Great, I'll poke and prod at it.

Thanks.

BTW, it would be GREAT to have some kind of wiki for user experiences, configs and settings, and hacks, that would go beyond just the condor manual. Anyone as Wisconsin considered this?

thanks,
rob


On Feb 7, 2008, at 6:26 PM, Dan Bradley wrote:


Yes, you should be able to update priorities on-the-fly.  condor_prio
doesn't accept a -constraint, but condor_qedit does, so you should be
able to get the job done efficiently using the latter.

--Dan

Robert E. Parrott wrote:
How about something like a minutely cron job that dynamically updates
parallel job priorities to reflect something like EUP priorities
(properly scaled and inverted).

I.e.:

1) Find all EUP's for all users in the queue, and scale to a range
(-1000, 1000), with 1000 highest priority (corresponding to EUP =
0.5), and -1000 to the max current EUP.

2) Find all parallel jobs.

3) update all parallel job's priories to reflect the new scaling,
user condor_prio -p [N] username.

It looks like condor_prio doesn't accept class ad expressions like
condor_status does, so it'll take a little more parsing with the
condor_q -l  to get it right.

Does this sound reasonable?

rob


On Feb 7, 2008, at 6:05 PM, Dan Bradley wrote:


I can't think of any way to make the parallel scheduler pay
attention to
EUPs.

Enhancing the parallel scheduler is one of the items we hope to
address
in the 7.1 development cycle.

--Dan

Robert E. Parrott wrote:

Is there a way to make the scheduler pay attention to EUPs?


On Feb 7, 2008, at 5:35 PM, Dan Bradley wrote:



Rob,

The parallel scheduler in condor does not pay any attention to user
priorities.  By default, it schedules things in the order
specified by
the priority of the job (i.e. the priority assigned to the job in
the
submit file by the user). Unlike other job universes, the parallel universe applies this priority across jobs from all users, so it is
expected that the users will coordinate the setting of these
priority
values.  If priorities are equal, then jobs are run in
first-in-first-out order.  You can select an alternate best-fit
algorithm as documented here:

http://www.cs.wisc.edu/condor/manual/
v7.0/3_3Configuration.html#14115

--Dan

Robert E. Parrott wrote:


As a followup to this, I've enabled verbose negotiator logging, and
see the following behavior.

Users do seem to come up for negotiation in EUP order, as expected.
However, the data about user jobs seem to be incorrect.

When a user with EUP 0.5 (the lowest) and a parallel job in the
idle
state comes up for negotiation, the message I'm seeing is

    "Negotiating with [user]@seas.harvard.edu skipped because no
idle
jobs."

Thus there's something amiss here with the info the schedd.

Any thoughts on this? The problem seems to not be present for
serial
or other jobs, just parallel universe jobs.

thanks,
rob


On Feb 7, 2008, at 4:23 PM, Robert E. Parrott wrote:




HI Folks,

We have a situation where a user with very high EUP, and a large
number of jobs in the queue, is always scheduled ahead of users
with
much lower (100 times or more) EUP, and thus much high priority.
All
these jobs are parallel (MPI) jobs, which is likely relevant.

To begin, can anyone suggest a method to diagnose the problem
here,
and how these evaluations are taking place. My understand from the manual is that user jobs are considered in order of priority (from
lower EUP to highest).  But the opposite seems to be occurring.

As an example, this user, using 156/200 resources, has a 12
process
parallel job complete. His EUP is 156. Immediately a new 12
process
job of his is started, despite the fact that there's a user
with EUP
0.5 and an 8 node job waiting in the queue.

Thank for any initial insight or input in how to address this.

rob


==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Project Manager., CrimsonGrid Initiative and
Program Manager, CyberInfrastructure Lab
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users- request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Project Manager., CrimsonGrid Initiative and
Program Manager, CyberInfrastructure Lab
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users- request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Project Manager., CrimsonGrid Initiative and
Program Manager, CyberInfrastructure Lab
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Project Manager., CrimsonGrid Initiative and
Program Manager, CyberInfrastructure Lab
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Project Manager., CrimsonGrid Initiative and
Program Manager, CyberInfrastructure Lab
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045