From: "Fox, Kevin M" <Kevin.Fox@xxxxxxxx> Date: 07/19/2016 08:16 PM
> So, I have used a lot of job schedulers in the past and in studying
the
> Condor architecture a bit, found what seems to be a unique feature
to Condor.
Hi Kevin - as it happens we migrated from a fairly
mature Grid Engine environment to HTCondor, and I ran into this very
issue. The default config with a separate queue for each submitter machine was
a bit perplexing at first cut, and led to a certain level of anxiety and
distress among the users. Since they were used to a rather broken "first-come
first-served" SGE config, the presence or absence of other people's
jobs in the queue was very important to them for their completion-time
estimates.
My resolution was to reconfigure HTCondor to use a
primary central scheduler using the schedd_host config setting, allowing
all users on all machines to submit to a single queue. See below:
> So, some questions:
> * How do you know it is safe to shutdown a schedd node without
affecting
> a running job? Can you temporarily mark the schedd for not getting
new
> jobs accepted so no new ones start to drain things? Does condor_q
only
> show local jobs? If so, is just checking for running = 0 enough to
tell if
> its safe to shutdown?
With a single scheduler, this becomes moot, but yes, condor_q by default in version 8.4 and earlier shows only the local jobs.
You need to run condor_q -global to see the queues from every submitter,
or run condor_status -schedd to see the counts.
Using condor_status -schedd -af -long | sort will
show you the full list of scheduler classad attributes. The trouble with
determining if anything is still running is that there's several totals involved
- idle jobs, held jobs, local jobs, flocked jobs, as well as running
jobs.
With ephemeral session nodes, you'll definitely want
to switch to a central scheduler.
I'd also be very interested to learn more about your
approach to your session machines off-list.
> * If you want to reinstall the node but not loose the jobs,
you have to
> maintain the condors job state somehow. is persisting /var/lib/condor/
> spool all you need to maintain this state, or are there other places
on
> the file system that need to persist?
Yes, the /var/lib/condor/spool is the location that
counts. The queue is represented by the job_queue.log file in that directory.
> * For sites that want to scale the number of schedd's and the number
of
> login nodes differently, is that possible? Is there a remote schedd
mode?
> I'm sure things like the syscall shadowing wouldn't work in such a
mode,
> but we haven't had a need for our site for that.
You set the SCHEDD_HOST configuration variable to
the hostname of the machine running your scheduler, and the condor_q,
condor_submit, and whatever else will refer to that machine.
You can use config file conditionals or templating
to set a different host for different login machines, or you can use
the $_CONDOR_SCHEDD_HOST environment variable to set the proper scheduler on
a user-by-user basis. Remember, though, that a user who submitted jobs to
one queue might become alarmed if they log in to a different machine
using a different scheduler and find their jobs "missing"
from the condor_q output.
Our largest environment has three schedulers. One
is what's hardcoded in the configuration file applied to all users on
all machines in the pool, and the target of everyone's condor_submit and
condor_q runs.
The second is used for a DRMAA-linked Python application,
because there was a problem with an older version of HTCondor
and DRMAA that didn't take remote schedulers into account, so DRMAA
could only delete jobs from the local scheduler.
Finally, there's a third scheduler for a small team
which occasionally will submit hundreds of thousands of small jobs at
a time, and in 8.0 and early 8.2 running condor_q didn't even work with
a queue that deep. Once we fixed the bugs and timeouts it
was still highly disruptive to everyone else's ability to run a quick
condor_q to check their job status, so we quarantined the 100k+
job submissions to their own little scheduler on a 7-year-old clunker
machine by modifying the job-submission script to use "condor_submit
-name".
With a remote schedd, the syscall shadowing is done
at the machine which is running the schedd, so as long as that machine
has access to the target filesystem to which the job refers,
then it'll work. But we're not using standard universe either, and
fairly few people are in any case.
Here we just have the jobs use NFS from the exec nodes
to pull input files in most cases, in order to take advantage
of the Linux buffer cache with depth-first machine fill, and then
have been migrating NFS-based output delivery to HTCondor output
transfers as time goes on.
There was a message from the CHTC team a week or two
ago here on how you can use HTCondor-C to handle file spooling
to a remote scheduler, if you're interested in that check out
the archives.