[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] starting htcondor interactive jobs?



Hi,

I have setup a small htcondor v8.4.7 system on CentOS 6.8.
At this time it's just two machines with one having COLLECTOR, MASTER, NEGOTIATOR and SCHEDD
and the other with MASTER and STARTD.

A vanilla batch job goes thru quickly but I'm having some problems with interactive jobs.

Question 1) A "condor_submit -interactive" job is started ok but the time delay that it takes to open a shell from the STARTD host varies from a few seconds up to 20 seconds.
There are no error messages output in the shell:

$ condor_submit -interactive
Submitting job(s).
1 job(s) submitted to cluster 45.
Waiting for job to start...
Welcome to server-a.example.com!
You will be logged out after 7200 seconds of inactivity.

When the interactive job is trying to start i can see this in SchedLog:

==> SchedLog <==
07/29/16 15:57:53 (pid:54416) Number of Active Workers 0
07/29/16 15:57:53 (pid:48154) GET_JOB_CONNECT_INFO failed: Job 45.0 is not running.
07/29/16 15:57:58 (pid:54416) Number of Active Workers 0
07/29/16 15:57:58 (pid:48173) GET_JOB_CONNECT_INFO failed: Job 45.0 is not running.
07/29/16 15:58:03 (pid:54416) Number of Active Workers 0
07/29/16 15:58:03 (pid:48201) GET_JOB_CONNECT_INFO failed: Job 45.0 is not running.
07/29/16 15:58:08 (pid:54416) Number of Active Workers 0
07/29/16 15:58:08 (pid:48225) GET_JOB_CONNECT_INFO failed: Job 45.0 is not running.
07/29/16 15:58:13 (pid:54416) Number of Active Workers 0
07/29/16 15:58:13 (pid:48246) GET_JOB_CONNECT_INFO failed: Job 45.0 is not running.
07/29/16 15:58:18 (pid:54416) Number of Active Workers 0
07/29/16 15:58:18 (pid:48275) GET_JOB_CONNECT_INFO failed: Job 45.0 is not running.
07/29/16 15:58:23 (pid:54416) Number of Active Workers 0
07/29/16 15:58:23 (pid:48295) GET_JOB_CONNECT_INFO failed: Job 45.0 is not running.
07/29/16 15:58:28 (pid:54416) Number of Active Workers 0
07/29/16 15:58:28 (pid:48315) GET_JOB_CONNECT_INFO failed: Job 45.0 is not running.

How could I debug this further, why it randomly takes a long time to open an interactive job?

Question 2) Is there a way to start a X11 application directly through a "condor_submit -interactive" command? This would be very useful in our use case. For example "condor_submit -interactive editor"

Best regards,

Aki