[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Remote condor_history with multiple Schedd hosts



The problem is that remote history query contacts a schedd, which turns around and runs a local copy of condor_history, passing it

the socket, and other arguments, but NOT the path to the history files or the localname of the schedd. 

So it looks in its own configuration and finds the file pointed to by the HISTORY config knob, rather than the one pointed to by the configuration variable TROLLS20.HISTORY

 

So a remote query to a schedd that has a localname override will read from the wrong history file.

 

You can work around this by writing a wrapper script for the condor_history that the secondary schedd runs. This is controlled by the configuration variable HISTORY_HELPER - which is normally this

 

HISTORY_HELPER=$(BIN)/condor_history

 

If you add this knob

 

TROLLS20.HISTORY_HELPER = $(BIN)/ history_of_trolls

 

And write a small script called history_of_trolls that does something like this

 

_condor_HISTORY=/opt/condor/history/trolls2-0.history condor_history $*

 

Then when condor_history looks in itâs own copy of the configuration for the history file it will find the correct file.

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Collin Mehring
Sent: Tuesday, December 17, 2019 2:51 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Remote condor_history with multiple Schedd hosts

 

Hello Experts,

 

I'm having an issue with condor_history (and the Python binding equivalent) returning blank results for some Schedds. (Version 8.8.5)

 

We have several Schedds in our pool split across two physical hosts. We consider one of these hosts the "primary" as it contains the default Schedd (i.e. DAEMON_LIST contains SCHEDD). We have specified the name and history file path for this Schedd:

 

SCHEDD_NAME = gld-default@

HISTORY = /opt/condor/history/gld-default.history

 

The additional schedds on both hosts follow a similar pattern, for example:

 

SCHEDD_TROLLS20 = $(SCHEDD)
SCHEDD_TROLLS20_ARGS = -f -local-name TROLLS20 -p 8510
TROLLS20.SCHEDD_NAME = trolls2-0@
TROLLS20.HISTORY = /opt/condor/history/trolls2-0.history

<...>

DAEMON_LIST = $(DAEMON_LIST), SCHEDD_TROLLS20

 

All config settings, other than the different schedds, are the same on both hosts.

 

Running 'condor_history -n gld-default@' from a remote host in the pool will return that Schedd's history correctly. Similarly, using -name for any Schedd on the primary host will work as expected. However, specifying the name of a Schedd on the secondary host will return just the header with no results. (e.g. condor_history -n trolls2-0@)

 

The command is reaching the correct Schedd, because it logs the following in response:

 

12/17/19 12:18:15 (pid:275016) invoking /usr/bin/condor_history condor_history -inherit -stream-results -match -1 -scanlimit 10000 -constraint true -attributes Args,Arguments,ClusterId,Cmd,CompletionDate,JobStatus,Owner,ProcId,QDate,RemoteUserCpu,RemoteWallClockTime
12/17/19 12:18:15 (pid:275016) Create_Process: using fast clone() to create child process.

 

The Schedd is also writing the history file correctly, because using condor_history -file instead works. (e.g. condor_history -file /opt/condor/history/trolls2-0.history)

 

Any help is appreciated.

 

Thanks,

Collin

--

Collin Mehring | PE-JoSE - Software Engineer

Image removed by sender.