[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [CondorLIGO] Getting job PID from condor



Hi Duncan,

Besides Peter's idea, you can ask the condor_startd to publish an attribute "JobPid" into the slot ad by putting in the config of your execute machine :

  STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS) JobPid

and then of course a condor_reconfig.

Note that by default the startd only advertises when the state of the slot changes, or every UPDATE_INTERVAL seconds (which defaults to 5 minutes). So it may take 5 minutes before you see the JobPid attribute appear in your slot ad; if that matters you could do a condor_status -direct <slot-name> to see the pid immediately, and/or lower UPDATE_INTERVAL (although that will result in more traffic going to your collector... maybe not a big deal but depends on the size of your pool...).

Maybe we should add JobPid (and several other useful bits of info that the startd receives from the starter) to SYSTEM_STARTD_JOB_ATTRS by default? Besides the pid, the starter is monitoring things like resident set size of the job, number of cores being utilized...

p.s. general interest HTCondor questions may benefit by going to htcondor-users email list... :)

regards,
Todd

On 2/17/2017 8:30 AM, Peter Francis Couvares wrote:
Duncan,

I donât believe Condor transmits the pid of your back to the submit
machine by default (and note that there are often many pids associated
with a job; but I assume you mean the âparent" job pid initially forked
by the condor_starter on the execute machine).

However, one good way to dramatically streamline your current technique
might be to use condor_ssh_to_job and print the _CONDOR_JOB_PIDS
environment variable.  I.e.:

% condor_ssh_to_job <jobid> 'echo _CONDOR_JOB_PIDSâ

-Peter



On Jan 25, 2017, at 9:29 PM, Duncan Meacher <duncan.meacher@xxxxxxx
<mailto:duncan.meacher@xxxxxxx>> wrote:

Hi,

I was just wondering if there is any way of easily obtaining the PID
of a job running on a compute node. I've looked through the
documentation as well as the output of condor_q -long but haven't been
able to see anything. At the moment I'm having to do a rather
complicated process of getting the condor ID from condor_q, searching
for the ip address within the dags nodes.log file, then ssh'ing into
that node to obtain it.

Thanks, Duncan

--
==========================

Duncan Meacher, PhD
Postdoctoral Researcher
Institute for Gravitation and the Cosmos
Department of Physics
Pennsylvania State University
104 Davey Lab #040
University Park, PA 16802
Tel: +1 814 865 3243
==========================
_______________________________________________
Condorligo mailing list
Condorligo@xxxxxxxxxx <mailto:Condorligo@xxxxxxxxxx>
http://lists.aei.mpg.de/cgi-bin/mailman/listinfo/condorligo

--
Peter F. Couvares
LIGO Laboratory / Caltech
peter.couvares@xxxxxxxx <mailto:peter.couvares@xxxxxxxx>



_______________________________________________
Condorligo mailing list
Condorligo@xxxxxxxxxx
http://lists.aei.mpg.de/cgi-bin/mailman/listinfo/condorligo



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685