[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Some Docker universe questions



Hi,

I have a couple of questions regarding the Docker universe. I am using HTCondor 8.5.5 and Docker 1.11.2 on SL7.

Firstly, when running "condor_who" on a machine running Docker universe jobs, the "PROGRAM" column shows the job id (the same as that shown in the "JOB" column), rather than the command being executed. I assume this is not meant to happen? For example:

[root@lcg1323 ~]# condor_who

OWNER              CLIENT      SLOT JOB         RUNTIME    PID       PROGRAM
patls053@domain    host.domain 1_2  6790916.0   0+00:03:44           6790916.0
 pcms024@domain    host.domain 1_1  6740918.0   0+00:18:12           6740918.0
patls053@domain    host.domain 1_3  6790816.0   0+00:19:15           6790816.0

Secondly, whenever a Docker universe job starts, this error appears in the StarterLog:

06/18/16 07:31:13 (pid:3079975) lock_file returning ERROR, errno=9 (Bad file descriptor)
06/18/16 07:31:13 (pid:3079975) FileLock::obtain(1) failed - errno 9 (Bad file descriptor)
06/18/16 07:31:13 (pid:3079975) Found 1 entries in docker image cache.
06/18/16 07:31:13 (pid:3079975) lock_file returning ERROR, errno=9 (Bad file descriptor)
06/18/16 07:31:13 (pid:3079975) FileLock::obtain(2) failed - errno 9 (Bad file descriptor)

Does anyone else see this or know what's causing this?

Finally, according to the version history:

http://research.cs.wisc.edu/htcondor/manual/v8.5.2/10_2Development_Release.html

since 8.5.2 job ClassAds are meant to contain network stats for Docker universe jobs (NetworkInput and NetworkOutput). However, I always get 0.0 for these, even though "docker stats" is reporting sensible numbers.

Thanks,
Andrew.