Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] starting htcondor interactive jobs?
- Date: Fri, 29 Jul 2016 19:12:24 +0300
- From: Aki Ketolainen <akik@xxxxxxxxxxx>
- Subject: [HTCondor-users] starting htcondor interactive jobs?
Hi,
I have setup a small htcondor v8.4.7 system on CentOS 6.8.
At this time it's just two machines with one having COLLECTOR, MASTER,
NEGOTIATOR and SCHEDD
and the other with MASTER and STARTD.
A vanilla batch job goes thru quickly but I'm having some problems with
interactive jobs.
Question 1) A "condor_submit -interactive" job is started ok but the
time delay that it takes
to open a shell from the STARTD host varies from a few seconds up to 20
seconds.
There are no error messages output in the shell:
$ condor_submit -interactive
Submitting job(s).
1 job(s) submitted to cluster 45.
Waiting for job to start...
Welcome to server-a.example.com!
You will be logged out after 7200 seconds of inactivity.
When the interactive job is trying to start i can see this in SchedLog:
==> SchedLog <==
07/29/16 15:57:53 (pid:54416) Number of Active Workers 0
07/29/16 15:57:53 (pid:48154) GET_JOB_CONNECT_INFO failed: Job 45.0 is
not running.
07/29/16 15:57:58 (pid:54416) Number of Active Workers 0
07/29/16 15:57:58 (pid:48173) GET_JOB_CONNECT_INFO failed: Job 45.0 is
not running.
07/29/16 15:58:03 (pid:54416) Number of Active Workers 0
07/29/16 15:58:03 (pid:48201) GET_JOB_CONNECT_INFO failed: Job 45.0 is
not running.
07/29/16 15:58:08 (pid:54416) Number of Active Workers 0
07/29/16 15:58:08 (pid:48225) GET_JOB_CONNECT_INFO failed: Job 45.0 is
not running.
07/29/16 15:58:13 (pid:54416) Number of Active Workers 0
07/29/16 15:58:13 (pid:48246) GET_JOB_CONNECT_INFO failed: Job 45.0 is
not running.
07/29/16 15:58:18 (pid:54416) Number of Active Workers 0
07/29/16 15:58:18 (pid:48275) GET_JOB_CONNECT_INFO failed: Job 45.0 is
not running.
07/29/16 15:58:23 (pid:54416) Number of Active Workers 0
07/29/16 15:58:23 (pid:48295) GET_JOB_CONNECT_INFO failed: Job 45.0 is
not running.
07/29/16 15:58:28 (pid:54416) Number of Active Workers 0
07/29/16 15:58:28 (pid:48315) GET_JOB_CONNECT_INFO failed: Job 45.0 is
not running.
How could I debug this further, why it randomly takes a long time to
open an interactive job?
Question 2) Is there a way to start a X11 application directly through a
"condor_submit -interactive" command?
This would be very useful in our use case. For example "condor_submit
-interactive editor"
Best regards,
Aki