[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor with kerberized home directories

There was a presentation given on this at HTCondor week this year.. apparently there are a number of

new features added to the condor_credd to enable these kinds of things.  DESY was the test case.


Steve Timm

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>
Sent: Tuesday, June 12, 2018 3:02:11 AM
To: HTCondor-Users Mail List; Andreas Hirczy
Subject: Re: [HTCondor-users] HTCondor with kerberized home directories
Am 12.06.2018 um 09:46 schrieb Andreas Hirczy:
> Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> writes:
>> We have HTCondor installed on our desktop machines for submission, and
>> the jobs run on worker nodes in a private network.  The desktops are
>> naturally subject to security updates and may be rebooted about once
>> per week. The home directories are mounted via NFSv4 with Kerberos 5
>> authentication.
> We use a similar setup with Kerberos/OpenAFS for home directories.
>> How are others solving this?
> I remember there used to be solutions with forwarded and postdated
> tickets. see e.g.
> <https://lists.cs.wisc.edu/archive/htcondor-users/2007-October/msg00089.shtml>
> A wrapper script with "k5start" might also work.

Is this really the same thing?
If I read the linked post, this seems to be about forwarding a ticket from the submission node (schedd) to the execute node (starter).
The issue I describe happens when the submission node (schedd) is rebooted, and the freshly booted machine does not have a ticket for the user's home directory anymore,
but the restarted schedd and shadow want to reconnect to the starter and then fail to access the log on the submission machine.

Or is the idea that forwarding would then work both ways (i.e. schedd => starter, if schedd is rebooted, forwarding back from starter => schedd)?

> I never tried those, since it somehow compromises the security gain from
> kerberos authentication. Also the setup always appeared a bit hacky and
> not really robust.

I agree.

> <https://lists.cs.wisc.edu/archive/htcondor-users/2017-January/msg00051.shtml>
> indicates some new development.

This looks much more like it is what we are looking for ;-).
Still, this "only" mentions automatic renewal, which would be nice to have - but it does not really solve the issue if the schedd reboots
and the ticket vanishes, making the log file directory inaccessible, or does it?

>> Is the only way to have some kind of scratch space somewhere, with unix auth?
> We have quite a bit of scratch space; created by utilizing unused disc
> capacity from computing nodes with MooseFS <https://moosefs.com/>.

Looks interesting! A lot like CephFS which we are using right now for our high performance storage on dedicated servers,
but with an even stronger focus on working on commodity hardware, and of course with "FS only".

So you are somehow educating your users to use directories in MooseFS to submit jobs and keep job logs,
and this is accessible via unix auth, to prevent the ticket-loss-issue if a schedd is rebooted?

Cheers and many thanks for the reply,

> Best regards,
> Andreas