Subject: Re: [HTCondor-users] Centralized job handling by central admin
From: "Espelo, Dennis van" <D.v.Espelo@xxxxxxxx> Date: 07/05/2016 10:22 AM
> All our Windows clients have Condor 8.4.7 installed, including the
ability
> to submit jobs. Is it possible to send your entire job from your local
> machine to the central admin (including required files) and let the
> central admin handle the job? Two reasons why I would like to
achieve this. > > 1. If a large number of
jobs are submitted from your local machine.
> It would require a lot of system performance (CPU and memory). Our
central
> admin is a physical Linux server with sufficient CPU and memory to
run a
> large amount of jobs simultaneously. > 2. Some of our submitters
have a laptop. While your job is running,
> the laptop need to be connected to the network, otherwise your results
> canât find their way back. Laptop users will appreciate the ability
to
> disconnect there laptop and take it home while their jobs are still
running. > > If this is possible, the results that are send
back to the central admin
> needs to be stored on a central storage. Is there any tutorial how
to
> change the configuration? At this moment I have no clue where to begin.
Hi Dennis,
Since our path into HTCondor here was through Grid
Engine, and everyone was accustomed to a single "pane," as it
were, for the queue via the "qstat" command, I was compelled three years
ago to solve this problem during the initial rollout.
The trick is setting the SCHEDD_HOST parameter in
your configuration. This causes the condor_submit and condor_q commands
to behave as if you've given it the "-n" option to specify
a target schedd to which the jobs will be submitted.
With the newer versions, having a single central schedd
is much more practical than it used to be, due to significant performance
and scalability improvements in late 8.2 and 8.4 releases.
The only point at which we opted to put up a second scheduler was
when one group was submitting 100,000 jobs at a time a couple of
years ago, since back then that would cause everyone's condor_q to
time out.
So that may do the trick in your case. The only tricky
item would be file delivery - the schedd on the host would need
filesystem access to the jobs' files. There's no provision, as far as
I know, for transferring files from the machine running condor_submit
to the machine running the schedd.