[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Get remote queue id



Hi Steve,

Steven Timm wrote:
On Wed, 17 Oct 2007, Michael Thomas wrote:

I have a condor-g host that is used heavily by users.  Every now and
then the users complain about some problem, and send me the condor-g
log, which gives the local condor-g queue id.  But I need to map this to
the job id in the remote condor queue (also managed by me) so that I can
poke through the remote system logs to find out what may have gone wrong.

Is there a way to map the local condor-g queue id to the remote condor
queue id?

There are two ways.
The hard way: Look in the Userlog of the condor-g user on the client.
That will have something like

027 (176900.089.000) 09/27 09:39:46 Job submitted to grid resource
     GridResource: gt2 fgitb-gk.fnal.gov/jobmanager-condor
GridJobId: gt2 fgitb-gk.fnal.gov/jobmanager-condor https://fgitb-gk.fnal.gov
:49036/27985/1190903977/

(in the case of a grid/gt2 resource.
The 2nd number, in this case 27985, is the process id of the
globus-job-manager process on the remote host and 1190903977 is
the timestamp.

Bummer. It looks like our condor 6.7.18 submit host doesn't print out the GridResource or GridJobId messages. All that I find is:

000 (339290.000.000) 10/26 14:21:34 Job submitted from host: <198.32.44.97:32799>
...
012 (339290.000.000) 10/27 05:04:43 Job was held.
        Unspecified gridmanager error
        Code 0 Subcode 0
...

By looking in the /var/log/messages of the grid CE you should
be able to match the condor job id with the pid of the globus-job-manager.

If you are using a VDT-based grid installation there is a
utility called vdt-get-job-info which will match condor job id
on server end to globus-job-manager pid--but not to the condor-G job
ide on the client which is what you really need.

The easier way:
We wrap condor_submit on the client end, adding things to the classad like this:

GlobusRSL = "(condorsubmit=('+SubmitHost' 'clienthostname.clientdomain')('+SubmitClusterID' $(Cluster)))"

The punctuation may not be quite right but the effect is to
change the client so that it always sends two extra fields across the
grid, namely the originating cluster id and the originating hostname.
Of course this only works if you have condor on both ends
but we have modified our few non-condor installations to throw
the condorsubmit RSL attribute on the ground so that it doesn't cause
an exception.

Can you give me more info on how you add things to the classad on the client side with such a wrapper? I'm familiar with adding things to the submit script on the server side by modifying the job manager, but I'm not sure how to accomplish the equivalent on the client side.

Thanks,

--Mike