[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] job submission on geographically distributed system



I would like to implement hpc job to be submitted on any of the available hpc clusters (geographically distributed) without grid implementation.

I don't know what this restriction is supposed to mean, so I'll ignore it for the rest of this reply; sorry.

Would HT condor help in this context? Most of the hpc clusters have already slurm running and scheduling jobs locally of that particular cluster.

HTCondor can certainly submit jobs to multiple different Slurm clusters.

Can a user submit the job on one of the hpc systems (best available in terms of resources) from central location?

	That I don't actually know.

How it can be done? How the user identity, data, application and environment would be taken care of?

When you write a HTCondor job, you specify the application, data, and environment that constitute the job, and HTCondor takes care of moving the files around and constructing the environment when the job runs. (Or, in this case, lands at a Slurm scheduler.)

User identity is a much harder problem, but HTCondor supports (through various mechanisms) mapping between the user who submitted the job and the identity used to run the job at various different sites. Ideally, you would be able to specify the job in such a way that the user identity at the time the job was running didn't matter. This doesn't and sometimes can't always happen, but the user identity problem is too complex to discuss is any detail here. Luckily, there is already a solution you could use or learn from: the "CE".

	https://htcondor-ce.readthedocs.io/en/latest/overview/

In your case, you probably wouldn't be using pilots.

- ToddM