[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Enable Job Router always



Hi Asjiva,

Users should be able to submit Condor-C jobs [1] to the schedd on your 
Slurm head node [2]. For example, from a submit host from your condor 
pool, you can submit a job with the following contents (in addition to 
your executable, arguments, etc) to the schedd on the Slurm submit node:

 ÂÂÂ universe = grid
 ÂÂÂ grid_resource = condor <USERNAME>@<SLURM SCHEDD FQDN> <SLURM SCHEDD 
CENTRAL MANAGER FQDN>

 ÂÂÂ remote_universe = grid
 ÂÂÂ remote_grid_resource = "batch slurm"
 ÂÂÂ ...

Replacing <USERNAME>, <SLURM SCHEDD FQDN>, and <SLURM SCHEDD CENTRAL 
MANAGER FQDN> with the values appropriate for your setup. Give that a 
spin and let me know how it goes.

- Brian

[1] 
https://htcondor.readthedocs.io/en/latest/grid-computing/grid-universe.html#htcondor-c-the-condor-grid-type
[2] 
https://htcondor.readthedocs.io/en/latest/grid-computing/grid-universe.html#the-batch-grid-type-for-pbs-lsf-sge-and-slurm

On 8/26/19 12:20 AM, Asvija B wrote:
> Dear Brian,
>
> Thanks for the response.. Pls see my inline comments..
>
>> Hi Asjiva,
>>
>> Could you help me understand what you're looking to accomplish at a high
>> level? It seems like you're trying to /automatically/ send jobs to your
>> Slurm cluster when it has available resources.
>
> The idea is to enable job submission to SLURM clusters using HTCondor 
> submit node. The submit node has to reside in a different condor pool 
> and each slurm clusters are in different pools. The head node in each 
> of these slurm clusters can run HTCondor...
>
>>
>> If you don't need to do it automatically, you could have users specify
>> that they want their jobs to run on your Slurm cluster instead of
>> relying on flocking, which will only work between HTCondor pools.
>
> How to specify this ? Do you mean by way of adding extra ClassAd 
> attributes ?
>
>> The
>> downside of this is that user jobs become "early binding", meaning that
>> jobs become tied to a specific pool/cluster and exclude potentially
>> available resources from other pools/clusters. To get "late binding",
>> you'll need a fair amount of machinery like a CE and pilot job setup to
>> overlay an HTCondor pool on top of your Slurm cluster or a side-by-side
>> HTCondor/Slurm installation on your Slurm nodes.
>>
>> Thanks,
>> Brian
>
> Ok. As I explained earlier, the idea is to enable parallel job 
> submission from a condor submit node to multiple SLURM clusters. Can 
> you please suggest me the best architecture for accomplishing this. 
> The constraint is that I cannot use GSI for security and instead 
> should rely on Kerberos.
>
>
> Thanks and regards,
>
> Asvija
>
>
>
>>
>> On 8/22/19 6:54 AM, Asvija B wrote:
>>> Hello everyone,
>>>
>>> I am testing the Job Router functionality to setup a test bed of
>>> distributed infrastructure.
>>>
>>> Following is my setup:
>>>
>>> Machine A (in a Condor pool X)Â is to act as a submit node.
>>>
>>> Machine B (in a different condor pool Y)Â is to act as a cluster head
>>> node. Condor is installed only on the head node. It gets integrated
>>> with SLURM batch submission for executing jobs on other worker nodes.
>>>
>>> The job submitted from A reaches B by flocking.ÂÂ Once it reaches B, I
>>> want the job to be routed as a batch job to be submitted to SLURM. I
>>> have created the corresponding Job router in B and is active.
>>>
>>> Now here's the problem:Â Since the job submitted at A has to flock to
>>> B, the machine B should report as having atleast 1 CPU available.
>>> Otherwise the job cannot get flocked from A to B as condor does not
>>> find a match.ÂÂÂ However when I start the machine B with having 1 CPU
>>> available, the flocked job from A will directly execute on the machine
>>> B without going to the Job Router.
>>>
>>> How do I prevent this default behaviour and force the flocked jobs on
>>> B to go through the Job Router always ?
>>>
>>> Is there any other better mechanism for accomplishing the kind of
>>> setup ? I know about the OSG middleware with Condor-CE element.
>>> However my constraint is that I have to use Kerberos authentication
>>> and not rely on using GSI certificates.
>>>
>>>
>>> Thanks and regards,
>>>
>>> Asvija
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------------------------------------ 
>>>
>>>
>>> [ C-DAC is on Social-Media too. Kindly follow us at:
>>> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>>>
>>> This e-mail is for the sole use of the intended recipient(s) and may
>>> contain confidential and privileged information. If you are not the
>>> intended recipient, please contact the sender by reply e-mail and 
>>> destroy
>>> all copies and the original message. Any unauthorized review, use,
>>> disclosure, dissemination, forwarding, printing or copying of this 
>>> email
>>> is strictly prohibited and appropriate legal action will be taken.
>>> ------------------------------------------------------------------------------------------------------------ 
>>>
>>>
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>>
>
> ------------------------------------------------------------------------------------------------------------ 
>
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------------------------------------------------------ 
>
>