Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] killing globus-job-managers

Date: Mon, 24 Jul 2006 12:31:39 -0700
From: Michael Thomas <thomas@xxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] killing globus-job-managers

Michael Thomas wrote:
> Once again I started seeing high loads on my gatekeeper due to a large
> number of globus-job-manager processes.
> 
[...]

After moving all of the user home directories from a NFS mount to a
local disk, this no longer seems to be a problem.

However, I'm seeing some other odd behaviour that doesn't make sense to
me.  I have a number of jobs coming through the OSG managed fork queue
that seem to get disconnected from the actual process.  If I look up the
PID for the condor queue id, I notice that the process isn't running
anymore.  When I look at the condor_q -l output for the job, I notice
that the files for RemoteSpoolDir, UserLog, Out, Err all don't exist.
Yet condor_q says that the job is still in the Running state.

I also see the same symtoms from the occasional grid-monitor job that
doesn't exit after an hour (still running after 24 hours).

Why would condor think the job is still running when the process is dead?

--Mike

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

References:
- [Condor-users] killing globus-job-managers
  - From: Michael Thomas

Prev by Date: Re: [Condor-users] Condor 6.8 and BirdBath Problem
Next by Date: [Condor-users] unkillable held jobs
Previous by thread: Re: [Condor-users] killing globus-job-managers
Next by thread: [Condor-users] Problem with condor_install script
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] killing globus-job-managers