[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] When do machine RANK settings apply?



> >>>I'm now experiments with your suggestion of:
> >>>
> >>>PREEMPTION_REQUIREMENTS = False
> >>>PRIORITY_HALFLIFE = 1
> >>>RANK = (TARGET.JobPrio * 2880)
> >>>
> >>>With our very long retirement time (enough for our jobs to finish
> >>>normally) this should be okay. I'll let you know how it works out.
> >>>      
> >>>
> >Hmm. Actually, it isn't working out at all. I had users with jobs in 
> >the system with JobPrio's of 10 and 11 respecitivily. I sent 
> a single 
> >job in with a JobPrio of 16 and expected my job to run on the next 
> >available machine. Not the case. My job is still sitting there while 
> >the two lower-JobPrio user's are passing jobs through the system.
> >
> 
>  From the netogiator log snippets you posted, it appears to 
> me that your job _was_ preempting other resource claims.  
> What is not clear is why that same job kept coming back in 
> subsequent negotiation cycles.  Do you see anything that 
> would explain that in the job's user log or in the ShadowLog?

Right. There's a line that says it's rejecting 94.0 and then a line that
says it's preempting bchan's job for 94.0 and then back again. The
ShadowLog for my submitting machine has nothing in it. The last entry is
dated January 4. My ScheddLog around ~13:40 has the following:

1/5 14:38:52 Sent ad to central manager for ichesal@xxxxxxxxxx
1/5 14:38:52 Sent ad to 1 collectors for ichesal@xxxxxxxxxx
1/5 14:39:58 Activity on stashed negotiator socket
1/5 14:39:58 Negotiating for owner: ichesal@xxxxxxxxxx
1/5 14:39:58 Checking consistency running and runnable jobs
1/5 14:39:58 Tables are consistent
1/5 14:39:58 Out of jobs - 2 jobs matched, 0 jobs idle, flock level = 0
1/5 14:39:58 Sent ad to central manager for ichesal@xxxxxxxxxx
1/5 14:39:58 Sent ad to 1 collectors for ichesal@xxxxxxxxxx
1/5 14:42:28 Sent ad to central manager for ichesal@xxxxxxxxxx
1/5 14:42:28 Sent ad to 1 collectors for ichesal@xxxxxxxxxx
1/5 14:44:39 Activity on stashed negotiator socket
1/5 14:44:39 Socket activated, but could not read command
1/5 14:44:39 (Negotiator probably invalidated cached socket)
1/5 14:44:58 Sent ad to central manager for ichesal@xxxxxxxxxx
1/5 14:44:58 Sent ad to 1 collectors for ichesal@xxxxxxxxxx
1/5 14:47:28 Sent ad to central manager for ichesal@xxxxxxxxxx
1/5 14:47:28 Sent ad to 1 collectors for ichesal@xxxxxxxxxx
1/5 14:49:49 DaemonCore: Command received via TCP from host
<137.57.176.9:33313>
1/5 14:49:49 DaemonCore: received command 416 (NEGOTIATE), calling
handler (negotiate)
1/5 14:49:49 Negotiating for owner: ichesal@xxxxxxxxxx
1/5 14:49:49 Checking consistency running and runnable jobs
1/5 14:49:49 Tables are consistent
1/5 14:49:49 Out of servers - 0 jobs matched, 2 jobs idle, 0 jobs
rejected

Seems pretty normal enough. I don' see anything there, do you? And the
log for the job itself has in it:

000 (094.000.000) 01/05 12:29:54 Job submitted from host:
<137.57.142.112:40413>
...

And that's it.

- Ian