[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] starting htcondor interactive jobs?

From: Aki Ketolainen <akik@xxxxxxxxxxx>
Date: 07/29/2016 12:13 PM

> Question 1) A "condor_submit -interactive" job is started ok but the
> time delay that it takes
> to open a shell from the STARTD host varies from a few seconds up to 20
> seconds.
> There are no error messages output in the shell:

The thing to remember here is the negotiator interval setting. Given the
overhead required for the processing, the central manager only runs
the matching process between pending jobs and available resources
at a minimum of 20-second intervals, with a default of 60 seconds.

The short answer is that this is normal. You're not merely launching
a remote login session to a specific machine which would be immediate,
you're asking the system to match your session's requirements to the
available resources and only then start the session, and that matching
is what's taking the extra time.

The SchedLog entries are probably from the -auto-retry command line
option used for condor_ssh_to_job while it waits for the job to start.

> Question 2) Is there a way to start a X11 application directly through a
> "condor_submit -interactive" command?
> This would be very useful in our use case. For example "condor_submit
> -interactive editor"

Almost. I wrote up a little script that prompts the user for
the memory and CPU estimate for a MATLAB session, and then fires it up
on a suitable server. The trick was to replicate the functionality of
the "-interactive" option manually within the script:

1. Set up a dummy submit description with the requirements and
   which sets +InteractiveJob = True.

   I think you could also use the stock INTERACTIVE_SUBMIT_FILE
   which you'd obtain via condor_config_val and then do a set of
   "-append" options for it with condor_submit; I didn't think of
   that possibility when I wrote it three years ago.

2. Submit it and grab the cluster ID.

3. Condor_ssh_to_job to the cluster ID with the desired command:
        condor_ssh_to_job 12345.0 editor

This also helps avoid a situation where the TMOUT set by the
interactive dummy submit causes the session to suddenly die if
the user connects and then throws their X editor or MATLAB session
into the background, leaving the shell racking up idle time
until it finally times out and takes the X app with it.

        -Michael Pelletier.