[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] - Docker universe jobs issues with dedicated slot users account



And sorry i forget to mention - unsetting DEDICATED_EXECUTE_ACCOUNT_REGEXP removes the issue and all jobs finish as expected.

On Thu, Sep 17, 2020 at 9:51 AM Ido Shamay <idoshamay@xxxxxxxxx> wrote:
Hi all,

I'm trying to set-up and test a condor pool with the docker universe (8.8.9 version).
It uses dedicated accounts for slot users (SLOT<N>_USER), and as advised in the user manual, also withÂDEDICATED_EXECUTE_ACCOUNT_REGEXP to match these accounts.

On some docker jobs, after some time running, the starter process is getting a SIGQUIT (probably from procd). It then kills the docker process, and the job returns to the queue, waiting for the next scheduling.

I was wondering if someone had experienced similar issues with this of configurations (or with docker universe in general)?

P.S - we are using dedicated user slot accounts, since our regular nobody user mode (STARTER_ALLOW_RUNAS_OWNER=false) doesn't work with docker universe, since starter tries to run docker process with --user=-1:-1 (which is not valid in docker).
Does this sound ok?

Thanks,
IdoÂ