[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] schedd state

From: "Fox, Kevin M" <Kevin.Fox@xxxxxxxx>
Date: 07/19/2016 08:16 PM
> So, I have used a lot of job schedulers in the past and in studying the
> Condor architecture a bit, found what seems to be a unique feature to Condor.

Hi Kevin - as it happens we migrated from a fairly mature Grid Engine
environment to HTCondor, and I ran into this very issue. The default config
with a separate queue for each submitter machine was a bit perplexing at
first cut, and led to a certain level of anxiety and distress among the
users. Since they were used to a rather broken "first-come first-served"
SGE config, the presence or absence of other people's jobs in the queue
was very important to them for their completion-time estimates.

My resolution was to reconfigure HTCondor to use a primary central
scheduler using the schedd_host config setting, allowing all users
on all machines to submit to a single queue. See below:

> So, some questions:
>  * How do you know it is safe to shutdown a schedd node without affecting
> a running job? Can you temporarily mark the schedd for not getting new
> jobs accepted so no new ones start to drain things? Does condor_q only
> show local jobs? If so, is just checking for running = 0 enough to tell if
> its safe to shutdown?

With a single scheduler, this becomes moot, but yes, condor_q by default

in version 8.4 and earlier shows only the local jobs. You need to run
condor_q -global to see the queues from every submitter, or
run condor_status -schedd to see the counts.

Using condor_status -schedd -af -long | sort will show you the full list
of scheduler classad attributes. The trouble with determining if anything
is still running is that there's several totals involved - idle jobs,
held jobs, local jobs, flocked jobs, as well as running jobs.

With ephemeral session nodes, you'll definitely want to switch to a
central scheduler.

I'd also be very interested to learn more about your approach
to your session machines off-list.

>  * If you want to reinstall the node but not loose the jobs, you have to
> maintain the condors job state somehow. is persisting /var/lib/condor/
> spool all you need to maintain this state, or are there other places on
> the file system that need to persist?

Yes, the /var/lib/condor/spool is the location that counts. The queue is
represented by the job_queue.log file in that directory.

> * For sites that want to scale the number of schedd's and the number of
> login nodes differently, is that possible? Is there a remote schedd mode?
> I'm sure things like the syscall shadowing wouldn't work in such a mode,
> but we haven't had a need for our site for that.

You set the SCHEDD_HOST configuration variable to the hostname of the
machine running your scheduler, and the condor_q, condor_submit, and
whatever else will refer to that machine.

You can use config file conditionals or templating to set a different
host for different login machines, or you can use the $_CONDOR_SCHEDD_HOST
environment variable to set the proper scheduler on a user-by-user basis.
Remember, though, that a user who submitted jobs to one queue might
become alarmed if they log in to a different machine using a different
scheduler and find their jobs "missing" from the condor_q output.

Our largest environment has three schedulers. One is what's hardcoded
in the configuration file applied to all users on all machines in the
pool, and the target of everyone's condor_submit and condor_q runs.

The second is used for a DRMAA-linked Python application, because
there was a problem with an older version of HTCondor and DRMAA that
didn't take remote schedulers into account, so DRMAA could only
delete jobs from the local scheduler.

Finally, there's a third scheduler for a small team which occasionally
will submit hundreds of thousands of small jobs at a time, and in 8.0
and early 8.2 running condor_q didn't even work with a queue that
deep. Once we fixed the bugs and timeouts  it was still highly
disruptive to everyone else's ability to run a quick condor_q
to check their job status, so we quarantined the 100k+ job submissions
to their own little scheduler on a 7-year-old clunker machine by
modifying the job-submission script to use "condor_submit -name".

With a remote schedd, the syscall shadowing is done at the machine
which is running the schedd, so as long as that machine has access
to the target filesystem to which the job refers, then it'll work.
But we're not using standard universe either, and fairly few people
are in any case.

Here we just have the jobs use NFS from the exec nodes to pull
input files in most cases, in order to take advantage of the Linux
buffer cache with depth-first machine fill, and then have been
migrating NFS-based output delivery to HTCondor output transfers
as time goes on.

There was a message from the CHTC team a week or two ago here on
how you can use HTCondor-C to handle file spooling to a remote
scheduler, if you're interested in that check out the archives.

        -Michael Pelletier.