[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Enable Job Router always



Dear Brian,

Thanks for the response.. Pls see my inline comments..

Hi Asjiva,

Could you help me understand what you're looking to accomplish at a high
level? It seems like you're trying to /automatically/ send jobs to your
Slurm cluster when it has available resources.

The idea is to enable job submission to SLURM clusters using HTCondor submit node. The submit node has to reside in a different condor pool and each slurm clusters are in different pools. The head node in each of these slurm clusters can run HTCondor...


If you don't need to do it automatically, you could have users specify
that they want their jobs to run on your Slurm cluster instead of
relying on flocking, which will only work between HTCondor pools.

How to specify this ? Do you mean by way of adding extra ClassAd attributes ?

The
downside of this is that user jobs become "early binding", meaning that
jobs become tied to a specific pool/cluster and exclude potentially
available resources from other pools/clusters. To get "late binding",
you'll need a fair amount of machinery like a CE and pilot job setup to
overlay an HTCondor pool on top of your Slurm cluster or a side-by-side
HTCondor/Slurm installation on your Slurm nodes.

Thanks,
Brian

Ok. As I explained earlier, the idea is to enable parallel job submission from a condor submit node to multiple SLURM clusters. Can you please suggest me the best architecture for accomplishing this. The constraint is that I cannot use GSI for security and instead should rely on Kerberos.


Thanks and regards,

Asvija




On 8/22/19 6:54 AM, Asvija B wrote:
Hello everyone,

I am testing the Job Router functionality to setup a test bed of
distributed infrastructure.

Following is my setup:

Machine A (in a Condor pool X)Â is to act as a submit node.

Machine B (in a different condor pool Y)Â is to act as a cluster head
node. Condor is installed only on the head node. It gets integrated
with SLURM batch submission for executing jobs on other worker nodes.

The job submitted from A reaches B by flocking.ÂÂ Once it reaches B, I
want the job to be routed as a batch job to be submitted to SLURM. I
have created the corresponding Job router in B and is active.

Now here's the problem:Â Since the job submitted at A has to flock to
B, the machine B should report as having atleast 1 CPU available.
Otherwise the job cannot get flocked from A to B as condor does not
find a match.ÂÂÂ However when I start the machine B with having 1 CPU
available, the flocked job from A will directly execute on the machine
B without going to the Job Router.

How do I prevent this default behaviour and force the flocked jobs on
B to go through the Job Router always ?

Is there any other better mechanism for accomplishing the kind of
setup ? I know about the OSG middleware with Condor-CE element.
However my constraint is that I have to use Kerberos authentication
and not rely on using GSI certificates.


Thanks and regards,

Asvija





------------------------------------------------------------------------------------------------------------

[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------