[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Enable Job Router always



Hi Asjiva,

Could you help me understand what you're looking to accomplish at a high 
level? It seems like you're trying to /automatically/ send jobs to your 
Slurm cluster when it has available resources.

If you don't need to do it automatically, you could have users specify 
that they want their jobs to run on your Slurm cluster instead of 
relying on flocking, which will only work between HTCondor pools. The 
downside of this is that user jobs become "early binding", meaning that 
jobs become tied to a specific pool/cluster and exclude potentially 
available resources from other pools/clusters. To get "late binding", 
you'll need a fair amount of machinery like a CE and pilot job setup to 
overlay an HTCondor pool on top of your Slurm cluster or a side-by-side 
HTCondor/Slurm installation on your Slurm nodes.

Thanks,
Brian

On 8/22/19 6:54 AM, Asvija B wrote:
> Hello everyone,
>
> I am testing the Job Router functionality to setup a test bed of 
> distributed infrastructure.
>
> Following is my setup:
>
> Machine A (in a Condor pool X)Â is to act as a submit node.
>
> Machine B (in a different condor pool Y)Â is to act as a cluster head 
> node. Condor is installed only on the head node. It gets integrated 
> with SLURM batch submission for executing jobs on other worker nodes.
>
> The job submitted from A reaches B by flocking.ÂÂ Once it reaches B, I 
> want the job to be routed as a batch job to be submitted to SLURM. I 
> have created the corresponding Job router in B and is active.
>
> Now here's the problem:Â Since the job submitted at A has to flock to 
> B, the machine B should report as having atleast 1 CPU available. 
> Otherwise the job cannot get flocked from A to B as condor does not 
> find a match.ÂÂÂ However when I start the machine B with having 1 CPU 
> available, the flocked job from A will directly execute on the machine 
> B without going to the Job Router.
>
> How do I prevent this default behaviour and force the flocked jobs on 
> B to go through the Job Router always ?
>
> Is there any other better mechanism for accomplishing the kind of 
> setup ? I know about the OSG middleware with Condor-CE element. 
> However my constraint is that I have to use Kerberos authentication 
> and not rely on using GSI certificates.
>
>
> Thanks and regards,
>
> Asvija
>
>
>
>
>
> ------------------------------------------------------------------------------------------------------------ 
>
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------------------------------------------------------ 
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx 
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/