[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Centralized job handling by central admin



From: "Espelo, Dennis van" <D.v.Espelo@xxxxxxxx>
Date: 07/05/2016 10:22 AM

> All our Windows clients have Condor 8.4.7 installed, including the ability
> to submit jobs. Is it possible to send your entire job from your local
> machine to the central admin (including required files) and let the
> central admin handle the job?  Two reasons why I would like to achieve this.

>  
> 1.       If a large number of jobs are submitted from your local machine.
> It would require a lot of system performance (CPU and memory). Our central
> admin is a physical Linux server with sufficient CPU and memory to run a
> large amount of jobs simultaneously.

> 2.       Some of our submitters have a laptop. While your job is running,
> the laptop need to be connected to the network, otherwise your results
> canât find their way back. Laptop users will appreciate the ability to
> disconnect there laptop and take it home while their jobs are still running.

>  
> If this is possible, the results that are send back to the central admin
> needs to be stored on a central storage. Is there any tutorial how to
> change the configuration? At this moment I have no clue where to begin.


Hi Dennis,

Since our path into HTCondor here was through Grid Engine, and everyone
was accustomed to a single "pane," as it were, for the queue via the
"qstat" command, I was compelled three years ago to solve this problem
during the initial rollout.

The trick is setting the SCHEDD_HOST parameter in your configuration.
This causes the condor_submit and condor_q commands to behave as if
you've given it the "-n" option to specify a target schedd to which
the jobs will be submitted.

With the newer versions, having a single central schedd is much more
practical than it used to be, due to significant performance and
scalability improvements in late 8.2 and 8.4 releases. The only point
at which we opted to put up a second scheduler was when one group
was submitting 100,000 jobs at a time a couple of years ago, since
back then that would cause everyone's condor_q to time out.

So that may do the trick in your case. The only tricky item would be
file delivery - the schedd on the host would need filesystem access
to the jobs' files. There's no provision, as far as I know, for
transferring files from the machine running condor_submit to the
machine running the schedd.

Perhaps someone can comment on this point.

        -Michael Pelletier.
_