[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Get remote queue id



On Sun, 28 Oct 2007, Michael Thomas wrote:

Hi Steve,

Steven Timm wrote:
On Wed, 17 Oct 2007, Michael Thomas wrote:

I have a condor-g host that is used heavily by users.  Every now and
then the users complain about some problem, and send me the condor-g
log, which gives the local condor-g queue id.  But I need to map this to
the job id in the remote condor queue (also managed by me) so that I can
poke through the remote system logs to find out what may have gone wrong.

Is there a way to map the local condor-g queue id to the remote condor
queue id?

There are two ways.
The hard way: Look in the Userlog of the condor-g user on the client.
That will have something like

027 (176900.089.000) 09/27 09:39:46 Job submitted to grid resource
     GridResource: gt2 fgitb-gk.fnal.gov/jobmanager-condor
     GridJobId: gt2 fgitb-gk.fnal.gov/jobmanager-condor
https://fgitb-gk.fnal.gov
:49036/27985/1190903977/

(in the case of a grid/gt2 resource.
The 2nd number, in this case 27985, is the process id of the
globus-job-manager process on the remote host and 1190903977 is
the timestamp.

Bummer.  It looks like our condor 6.7.18 submit host doesn't print out
the GridResource or GridJobId messages.  All that I find is:

000 (339290.000.000) 10/26 14:21:34 Job submitted from host:
<198.32.44.97:32799>
...
012 (339290.000.000) 10/27 05:04:43 Job was held.
        Unspecified gridmanager error
        Code 0 Subcode 0
...

I'd strongly suggest to update from 6.7.18 on the submit host.
There are lots of bugs in the grid_monitor.sh that have been
fixed since that time.  Also the "unspecified gridmanager error"
means that this could be a problem where the job never actually
got as far as getting through the batch system submission
on the other end.

In the case above it is still worth looking at /var/log/messages
on the server side and also $GLOBUS_LOCATION/var/globus-gatekeeper.log
at that timestamp to see what you can find.  Maybe with the
combination of IP and timestamp you can still find something.



By looking in the /var/log/messages of the grid CE you should
be able to match the condor job id with the pid of the globus-job-manager.

If you are using a VDT-based grid installation there is a
utility called vdt-get-job-info which will match condor job id
on server end to globus-job-manager pid--but not to the condor-G job
ide on the client which is what you really need.

The easier way:
We wrap condor_submit on the client end, adding things to the classad like
this:

GlobusRSL = "(condorsubmit=('+SubmitHost'
'clienthostname.clientdomain')('+SubmitClusterID' $(Cluster)))"

The punctuation may not be quite right but the effect is to
change the client so that it always sends two extra fields across the
grid, namely the originating cluster id and the originating hostname.
Of course this only works if you have condor on both ends
but we have modified our few non-condor installations to throw
the condorsubmit RSL attribute on the ground so that it doesn't cause
an exception.

Can you give me more info on how you add things to the classad on the
client side with such a wrapper?  I'm familiar with adding things to the
submit script on the server side by modifying the job manager, but I'm
not sure how to accomplish the equivalent on the client side.

Thanks,

--Mike


As far as client side wrappers, I am attaching one that we
use here at Fermilab.  This is for vanilla universe jobs though,
not grid jobs.  All you do is to move the condor_submit binary
to something like condor_submit_real and replace condor_submit
with a shell script.  Of course you have to remember to do this
every time you upgrade condor.

Steve Timm




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.
#!/bin/bash
# Goal is to add the group and user names as part of the AccountingGroup
# flag in all jobs, local and grid.
# get user name
COUSERNAME=`/usr/bin/id -un`
COGROUPNAME=`/usr/bin/id -gn`
#echo "`/bin/date` $COUSERNAME $COGROUPNAME $$" >> /tmp/condorsubmit.out
#Parse the arguments of the condor_submit, look 
#for the file name.
# From Condor submit help, these are the args:
#Usage: condor_submit_real [options] [cmdfile]
#       Valid options:
#       -verbose                verbose output
#       -name <name>            submit to the specified schedd
#       -remote <name>          submit to the specified remote schedd
#                               (implies -spool)
#       -append <line>          add line to submit file before processing
#                               (overrides submit file; multiple -a lines ok)
#       -disable                disable file permission checks
#       -spool                  spool all files to the schedd
#       -password <password>    specify password to MyProxy server
#       -pool <host>            Use host as the central manager to query
#
#       If [cmdfile] is omitted, input is read from stdin
#
#echo $* >> /tmp/condorsubmit.out
SUBARGS=$*
NEXTARG=0
VANILLA=""
#echo $# >> /tmp/condorsubmit.out
#echo "$@" >> /tmp/condorsubmit.out
if [ $# -gt 0 ] 
then
	for subarg in "$@"
	do
		case $subarg in 
			-v*)
				;;
                        -d*) 
                                ;;
                        -s*) 
                                ;;
                        -h*) 
                                ;;
                        -p*) 
                               NEXTARG=1
                                ;;
                        -a*)
			       NEXTARG=1
                                ;;
                        -r*)
			       NEXTARG=1
                                ;;
                        -n*) 
                               NEXTARG=1
                                ;;
                        *)
                            if [ $NEXTARG -ne 1 ] 
			    then
				SUBFILE="$subarg"
#                                echo "$SUBFILE" >> /tmp/condorsubmit.out
                                VANILLA=`grep -i vanilla $SUBFILE`
		            else
#                                echo "this argument skipped $subarg" >> /tmp/condorsubmit.out
                                NEXTARG=0
                            fi		    
                esac
	done
	if [ "$VANILLA" != "" ] 
        then 
		condor_submit_real -a "+Agroup = \"group_$COGROUPNAME\"" -a "+AccountingGroup = \"group_$COGROUPNAME.$COUSERNAME\"" "$@"
        else
		condor_submit_real "$@"
        fi
else
	condor_submit_real "$@"
fi
#echo "$VANILLA" >> /tmp/condorsubmit.out