[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Vanilla jobs get preempted and restart from begining (because of user priority)



Rob,
Just to verify that the settings took, condor_restart -negotiator might be a good idea on your central manager. Additionally, checking

condor_config_val -verbose PREEMPTION_REQUIREMENTS
condor_config_val -negotiator PREEMPTION_REQUIREMENTS

to verify that the negotiator has the right setting is always a good idea.

If you want to ensure no preemption occurs for either user-priority or machines' RANK expressions, try setting:
NEGOTIATOR_CONSIDER_PREEMPTION=False
http://www.cs.wisc.edu/condor/manual/v7.0/3_3Configuration.html#14855

If this doesn't work, post your negotiator log, and we can look at it from there.

Hope this helps.  Let us know how it works out.

Best,
Rob

-- 
===================================
Rob Futrick
main: 888.292.5320

Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and CycleServer Management Tools

http://www.cyclecomputing.com


Rob Stevenson wrote:
Dear users,
I have recently put condor into production servers here. Everything runs nicely except when we get long runs submitted - only recently have we started using condor for longer runs.
 
All the runs affected are in the vanilla universe and over 10 hours long. We currently have 4 servers that have the requirements needed for these jobs to run.
 
What is happening is that user1 may submit 4 jobs which run fine for a few hours. But then user2 submits a job, and one of user1's jobs gets stopped so that user2's job can run. I understand this is a result of user priorities*. The problem is that when the initial run that was stopped goes to restart, it starts from the beginning.
 
Some of these longer jobs will never actually finish given that new jobs from new users are being regularly submitted.
 
I have tried adding the following to condor_config.local on the four servers. But it appears to still do the same thing.
PREEMPTION_REQUIREMENTS = False
 
I think the best way to solve this would be to prevent runs from being stopped at all, and just letting them continue running to the end. Does anyone have any suggestions?
 
The servers are:
$CondorVersion: 7.0.1 Feb 27 2008 BuildID: 76180 $
$CondorPlatform: INTEL-WINNT50 $
 
Desktop (the user that submits the jobs)
$CondorVersion: 7.0.0 Jan 22 2008 BuildID: 72173 $
$CondorPlatform: INTEL-WINNT50 $
 
The condor master:
$CondorVersion: 7.0.0 Jan 22 2008 BuildID: 72173 $
$CondorPlatform: INTEL-WINNT50 $
 
Sorry I haven't provided any more specifics, I really don't know what would be useful and don't want to drown you in useless output! Let me know if there is anything else that would help!
Thanks for any advice,
 
Rob
 


**********************************************************************
HR Wallingford uses Faxes and Emails for confidential and
legally privileged business communications. They do not of
themselves create legal commitments. Disclosure to parties
other than addressees requires our specific consent. We are
not liable for unauthorised disclosures nor reliance upon them.
If you have received this message in error please advise us
immediately and destroy all copies of it.

HR Wallingford Limited
Howbery Park, Wallingford, Oxon, OX10 8BA

HR Wallingford Limited is a company registered in:
Companies House, Cardiff, Crownway, Maindy CF14 3UZ
Company No. 02562099 VAT No. GB 570 039 752
**********************************************************************

_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/