[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor-G question




I have a situation where my head node is used both to submit
grid/gt2 universe jobs, and receive and run them.  I can trace
the whole line of submission from submission of the grid/gt2
universe job to the running on my cluster and sending back the results.

Lately I have been seeing a lot of jobs where the
grid/gt2 universe job sitting in the submitter queue never shows
to be finished.  while it's running on the grid resource, it shows
idle to the submitter.  Once it has finished running on the grid resource,
it still shows as running to the submitter.

My question:  what is the condor-g grid monitor/ GAHP server
looking for so that it can report the successful completion
of the job back to the original submitting host?

Could a premature deletion of the $GLOBUS_LOCATION/tmp/gram_job_state/gram_condor_log
file (which is the UserLog of the vanilla universe job
that globus puts into my condor pool), be causing condor-g to
get confused?

Also where do I find out what the status values mean in

/tmp/condor_g_scratch.0x985e890.20145/grid-monitor-job-status.fngp-osg.fnal.gov:2119.23793.3

i.e, status 2 is probably running job, but what does status 8 mean?


Steve Timm


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525  timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team