[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [HTCondor-Users] NEGOTIATOR_PRE_JOB_RANK is not working as expected



On 8/12/2019 12:58 PM, Vikrant Aggarwal wrote:
> Hello Experts,
> 
> Sorry for flurry of questions:
> 
> We introduced some high memory nodes in existing condor cluster. I have 
> added "highmemory = true" in machine classAD.
> 
> Without making any change in submit file I want to steer jobs demanding 
> memory greater than certain value to high mem nodes. Prepared this 
> expression for it.
> 
> DEFAULT_RANK = (10000000 * My.Rank) + (1000000 * (RemoteOwner =?= 
> UNDEFINED)) - (100000 * Cpus) - Memory
> NEGOTIATOR_PRE_JOB_RANK = ifthenelse(Target.requestmemory > 1024, 
> 10000000 * (highmemory == true), $(default_rank))
> 
> First job with 1000 request_memory landing on node without highmemory 
> machine classAD
> Second job with 1030 request_memory landing on node with highmemory 
> machine classAD.
> *Third job with 1000 request_memory lands on node with highmemory 
> machine classAD (not expected).*
> 

You say all your high memory nodes have "highmemory = True" in the machine ad.... but do all other nodes have "highmemory = False" ?  If not, this could be your problem, as NEGOTIATOR_PRE_JOB_RANK will evaluate to UNDEFINED in that case instead of the number you expected.  You could change your clause "highmemory == true" to instead read
"highmemory =?= true".  See
  https://htcondor.readthedocs.io/en/v8_9_2/misc-concepts/classad-mechanism.html#expression-examples
for an explanation of the =?= and =!= operators.

But additionally, just steering your big memory jobs to you big memory slots is probably not all you want to do... I imagine 
you probably also want to explicitly steer your small memory jobs away from your big memory slots.  

Finally, your NEGOTIATOR_PRE_JOB_RANK ignores all the other goodness in the default expression if the job requests lots of memory, such as preferring a slot that is completely idle (no RemoteOwner) over a slot that is already busy serving someone who would need to be preempted. Not sure if you really intended that or not.  

I'd suggest trying 

NEGOTIATOR_PRE_JOB_RANK = $(NEGOTIATOR_PRE_JOB_RANK) + 
     1000000 * ((requestmemory > 1024 && highmemory =?= true) || (requestmemory <=1024 && highmemory =!= true))

**Warning** I didn't test the above suggestion, I am just pontificating off the top of my head... hope
I am helping more than I am hurting :)

regards,
Todd





> If the second job is with 1000 of request_memory then all jobs go to the 
> same node without highmemory machine classAD.
> 
> I didn't find it's related to concept *consumption_policy* because for 
> each job condor negotiator cycle is happening.
> 
> Without modifying anything in submit file, any other recommended method 
> of steering high request_memory jobs to highmem nodes and if the 
> resources are not available in highmem nodes then to normal nodes.
> 
> Thanks & Regards,
> Vikrant Aggarwal
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 


-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685