[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Singularity and Charliecloud support for HTCondor?



Dear HTCondor experts,

HTCondor has initial support for Singularity, and we are actively using that. 

However, it has several basic flaws related to interactive jobs / condor_ssh_to_job:
- condor_ssh_to_job ends up in it's own namespaces instead of those of the job
- additional bind mounts are needed to make interactive jobs work at all (see my previous mails to this list)
- running sshd in a user namespace does not work yet (sshd does not like tty devices owned by the overflow user ID)

Additionally, investigating the different options out there objectively, it seems that Charliecloud offers obvious security benefits when user namespaces can be used (=> sufficiently recent kernels)
and it's runtime seems to be as full featured as that of Singularity, and it also learns right now to run containers built with Singularity without modifications. 

User namespaces are desirable, since otherwise, the container solutions needs to be setuid root, which in the end means very similar security implications as 
those which go with a daemon running as root (c.f Docker, which however has seen quite some more security testing than the other new container projects). 

So from a Cluster administrator point of view, it would be great to resolve the existing issues with interactive jobs, and add support for Charliecloud,
which (to run jobs) can basically be used as a "drop-in replacement" of Singularity if the cluster OS supports user namespaces. 
Charliecloud can be used in the very same manner in which HTCondor uses Singularity right now, it only needs a different wrapper command, 
but offers very similar features (e.g. custom bind mounts) with only small syntax changes. 

I'm not unwilling to invest some time into these goals. Since for Charliecloud, only the binary name and options are slightly different, implementation seems straightforward. 
To resolve the existing issues with interactive jobs, it might be better to re-think the general approach. For example, it might be much easier and cleaner to start the sshd as the user on the bare metal host,
and then "nsenter" into the namespaces of the container's namespace (this should work for any container solution). 
This avoids all issues related to running sshd in user namespaces, which would likely need fancy changes to either OpenSSH (ignore tty owner and group) or the kernel (do special uid / gid mapping for those devices). 
However, I'm not deep enough into HTCondor code to do such changes quickly (yet). 

Is there somebody actively working on these things, for example, completing the basic Singularity support to make it production-ready? 

Are the general ideas and planned direction acceptable? 
Alternatively, one could also think about generalizing the Singularity-"plugin" to support different container solutions. 
Should I better ask things like these on the developer mailing list? 

Cheers, 
	Oliver

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature