[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs massively killed with PeriodicRemove



On 3/24/22 12:08, Carles Acosta wrote:
Hi,

If I am not wrong, condor_drain is not killing the jobs, it is just gracefully waiting until the MaxVacateTime/MaxJobRetirementtime of the jobs is reached andÂput again the jobs on queueÂif they have not finished. These jobs requeued have already started for that reason NumJobStarts > 0 and your periodic_remove _expression_ remove all these jobs. Do you really need this Periodic Remove _expression_?

condor_drain should not kill running jobs but it does that is I think our problem that I try to solve. I do not need this Periodic Remove _expression_ but I did not succeed to get rid of it.



Then, my guess is that if you have a MaxJobRetirementtime long enough during the drain, you will leave enough time for the jobs to finish and not be returned again to the queue.

I will check what is MaxJobRetirementtime and try to tune it.

Thanks


Edith



Cheers,

Carles

On Thu, 24 Mar 2022 at 11:50, Edith Knoops <knoops@xxxxxxxxxxxxx> wrote:
On 3/24/22 10:23, Beyer, Christoph wrote:
> Hi,

Thanks for your answer.


>
> where did you define the system periodic remove _expression_, it does actually say remove jobs that are idle and have not started yet which is pretty much the definition of an idle job ;) ?

in submit-condor-job on the ARC but this was not changed when it works
with no defrag. And I did not modify it.

I tried to comment all periodic remove and force it at false but with
that nothing was running.

With the actual configuration a lot of jobs are killed but the cluster
is more or less full of running jobs.

And sur I did not want to kill all idle jobs, queue arr usefull ð


>
> This might make sense if you want to lower the idle job queue to 'near to 0' and only accept jobs that more or less start in the same second - still a weird approach this would be :)

That is not what I want


>
> Best
> christoph
>

--
--------------------------------------------------------------
Edith Knoops
CPPM/CNRSÂ Â Â Â Â Â Â Â Â Â Â Â ÂMail: knoops@xxxxxxxxxxxxx
163 Av de Luminy case 902Â Â Â Â ÂTel : (+33) (0)4 91 82 72 02
13288 Marseille Cedex 9 France
--------------------------------------------------------------

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
AvÃs - Aviso - Legal Notice: Âhttp://legal.ifae.es

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


-- 
--------------------------------------------------------------
Edith Knoops
CPPM/CNRS    	                  Mail: knoops@xxxxxxxxxxxxx
163 Av de Luminy case 902         Tel : (+33) (0)4 91 82 72 02
13288 Marseille Cedex 9 France 
--------------------------------------------------------------