Thank you very much for your reply! It contained a lot of useful advise.
In terms of attempts to reproduce the problem with jobs becoming idle after condor_qedit modifies SlotID Requirement: I did not attempt to study the problem systematically - it happened to me about half a dozen times in the last 2-3 weeks. And so it was certainly not a single occurrence. I noticed that simply changing priority of my jobs using condor_prio makes them to run again (such priority changes do not make such job any more or less relatively prioritized with respect to other jobs). After your last e-mail, I also started to run condor_reschedule after every modification of job ClassAds, and so far no job cluster got stuck.
I have switched my servers to use 100% dynamic slots according to your advice. And it works beautifully.
If you have time to help a bit more, I have a few questions about how to fulfill some requirements I have in practical use of using HTCondor:
1) Some resources are naturally limited, and to achieve overall optimal computing progress, it is desired to fully utilize those resources whenever they are available. Examples of such resources: a) number of connections to a certain database (to avoid overloading the DB or even crashing it, number of simultaneous queries has to be limited), b) server grade GPUs, which can run up to a certain number of jobs in parallel. So, how can I define such custom resources and require them in submit files? If I set priority of jobs utilizing such resource to be max, then the resources would be fully utilized.
2) Can I ask HTCondor to always run a certain number of jobs from a specific cluster? This is needed to make sure that large (i.e. requiring a lot of RAM) lower priority jobs continue to make progress, while higher priority CPU-bound jobs utilize the remaining RAM.
3) Is it possible to dynamically put to sleep (or in the worst case kill and later restart) jobs, which attempts to allocate such amount of RAM which would leave less than some threshold percent of RAM remaining available on a computing node? For example: if a job attempts to allocate for some operation such amount of RAM, that after such operation less than 10% of RAM would remain free, I want it to fall asleep until that would be possible. The practical situation is that some jobs use wide range of RAM while they run: for example, the max could be almost an order of magnitude higher than the min. If each job requires the max amount of RAM it needs in the submit file, then most of the time the computing nodes could have large fraction of RAM not utilized and only a small number of jobs would run (i.e. CPU would not be fully utilized also) since large amount of RAM is only needed by such jobs for some rather small fraction of time. Or, is it at least possible to automatically reschedule jobs (for example, in held or some other state) which were killed by the OS due to memory allocation problems?
Thank you very much for your help,
On 5/30/2018 2:04 PM, Vaurynovich, Siarhei wrote:
> *Please, let me know if there is a way to force HTCondor matchmaker to
> consider a job cluster for scheduling.*
The command "condor_reschedule", issued on the submit host (i.e. where the schedd is running), will do that. However, by default, this should happen automatically every few minutes.
> My jobs often sit unscheduled in the queue for many hours
> (indefinitely) if I use condor_qedit to adjust job requirements.
> To make sure jobs have enough RAM to run, I sometimes restrict allowed
> SlotID range in requirements. There is probably a better way to do it:
> i.e. somehow to declare RAM as a shared resource with certain number
> of units of the resource available, but for now this is my quick hack
> to do it. Setting ImageSize does not work since my jobs are almost
> always bigger than per slot RAM and so if I give realistic job size,
> my jobs would never start. Creating specialized slots is also a bad
> idea since my jobs vary strongly in size.
The above sounds like pretty strange usage. As you suspect, there are
better ways to do this. Assuming you are using a current version of
HTCondor (i.e. HTCondor v8.6 or above), instead of configuring your
nodes to partition resources like memory into statically sized slots,
you could configure your nodes to use dynamic (partitionable) slots.
See the HTCondor Manual section "Dynamic Provisioning: Partitionable and
Dynamic Slots" at URL http://tinyurl.com/y83a9ufo. Once setup your
execute nodes to use a partitionable slot as described, then your
condor_submit file can look like:
executable = foo
# This job only needs one CPU core in the execute slot
request_cpus = 1
# This job needs 3.5 GB of RAM in the execute slot
request_memory = 3500
and the execute node (startd) will carve off a new slot with 3.5GB of
memory for this job. No messing around with ImageSize required.
> The problem is that often after such adjustment, my jobs would often
> stop being scheduled for running – they sit in the queue indefinitely
> and ‘condor_q -better-analyze clusterID’ gives “Job has not yet been
> considered by the matchmaker.” while claiming that there are slots
> “available to run your job”. If I do not use condor_qedit, jobs run
> fine. If I kill the same jobs and then submit them again with new
> requirements, they also run fine.
This sounds pretty strange. Can you easily reproduce it? Does it
happen every time or only sometimes? What version of HTCondor are you
using, on what platform?