[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [gram-user] Running Grid Monitor in debug?



On Mon, 28 Aug 2006, Jens-Soenke Voeckler wrote:

Hi Steve,

I had a similar effect (Condor-G never realizing a remote pre-WS job was done) when I tried to pair a Globus 4.0.1-cvshead-20051220 with a Condor 6.7.14 or so. Switching back to an earlier version of Globus (cvshead-20051110) had helped at that time. I suspect that something in GRAM must have changed between Nov'05 and Dec'05, perhaps the way it records job state changes on the gatekeeper, without folks telling Jaime about it. Are you using Condor 6.8.0 with Globus 4.0.2+bugfix ?

Have used condor 6.7.13, 6.7.18, 6.7.20 all against globus 4.0.1 equivalent (VDT 1.3.10/OSG 0.4.1)--same effect for all.
Haven't used 6.8.0 yet because I know it is buggy, waiting for 6.8.1.
Backdating globus isn't an option.  And just turning off
the condor grid monitor doesn't seem to be an option anymore either.

Steve Timm




Jens.

On Aug 28, 2006, at 7:52 , Steven Timm wrote:


I am currently trying to use Condor-G to submit
jobs to a gt2 jobmanager-pbs resource.  My test job runs fine
on some remote jobmanager-pbs resources that are known to be good.
But on the new one I am configuring, the symptom is the following:

Condor-G submits the job, in my desktop queue it shows as "idle"
on the remote gt2/pbs site, the jobmanager-pbs exits immediately
as it should, the job is submitted, I can see it running in pbs,
the stdout and stderr get put in the directory where they should,
but condor-G never detects that the job is running, nor that it
has completed, and the stderr never gets put back where it should.
globus-job-status reports that the job is complete as soon as it is submitted.

I suspect that there's something wrong in the polling interface
where the condor grid monitor (which is submitted via a fork to
the gt2/pbs host) doesn't correctly talk to the jobmanager-pbs to
poll the job, because it never sees any jobs running on the remote host.
But the only way to do this is to debug somehow.
The grid monitor has got a debug flag but the job that is
submitting it is submitted from deep within the guts of condor
and I don't see any way to enable the debug flag.  Does anyone
else know how to enable the debug flag, and/or to run
the grid monitor interactively?   Instructions on how to do
so were at one point posted to this list but I can't find them.
strace on the existing grid monitor doesn't tell me anything.

Steve Timm


--
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525 timm@xxxxxxxx http://home.fnal.gov/~timm/ Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team

Aloha,
Dipl.-Ing. Jens-S. Vöckler   voeckler at isi dot edu
University of Southern California Viterbi School of Engineering
Information Sciences Institute; 4676 Admiralty Way Ste 1001
Marina Del Rey, CA 90292-6611; USA; +1 310 448 8427
* You can rely on any shared filesystems for only one thing - don't! *




--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525  timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team