[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] how to change requested memory (cpus) for running job



Hi Michael,

its good to hear that I'm not alone. 
Changing the machine attributes could be a solution, 
but hard stuff. 
But maybe we or someone else could find a solution.

Best
Harald



On Wednesday 25 January 2017 20:02:10 Michael Pelletier wrote:
> Hi Harald,
> 
> I think we're in the same boat - the key is changing the machine
> attributes, rather than the job attributes, and I'm looking to do that for
> concurrency limits to deal with late-job license checkouts as opposed to
> memory allocations. It's starting to look like I may wind up building DAGs
> anyway, but it'd still be a useful trick to have.
> 
> I've tried a few things and have gotten some really wierd results with
> condor_update_machine_ad and condor_advertise, so I'm still hunting for
> the proper incantations.
> 
> One of the considerations is the permissions required to change the machine
> ad - a job owner can't change the machine ad even for the slot in which
> the job is running, so there'd need to be some sort of signaling
> mechanism, such as a custom job attribute, to allow the job to trigger a
> process with the necessary permissions to validate and make the changes on
> the machine ad.
> 
> 	-Michael Pelletier.
> 
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Harald van Pee Sent: Wednesday, January 25, 2017 1:21 PM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: Re: [HTCondor-users] how to change requested memory (cpus) for
> running job
> 
> Dear all,
> 
> does no answer mean that there is no expert around these days or is it just
> not possible with htcondor to change any ClassAdds for a running job?
> 
> The idea is just to change the reserved memory in a way that the available
> memory decreases that no other job with big memory request can start which
> could crash the machine or a long running job. The available memory should
> not go to 0 if there is enough memory available and the available memory
> should just inrease again if the job finish. Therefore a reread of the
> reservedMemory ClasAdd on the start machine, without killing any job,
> seems to be perfect, if possible.
> 
> We are working on checkpointing of our jobs, but for some it seems not
> possible.
> 
> Any ideas would be welcome
> 
> Harald
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/