[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] killing globus-job-managers



Michael Thomas wrote:
> Once again I started seeing high loads on my gatekeeper due to a large
> number of globus-job-manager processes.
> 
[...]

After moving all of the user home directories from a NFS mount to a
local disk, this no longer seems to be a problem.

However, I'm seeing some other odd behaviour that doesn't make sense to
me.  I have a number of jobs coming through the OSG managed fork queue
that seem to get disconnected from the actual process.  If I look up the
PID for the condor queue id, I notice that the process isn't running
anymore.  When I look at the condor_q -l output for the job, I notice
that the files for RemoteSpoolDir, UserLog, Out, Err all don't exist.
Yet condor_q says that the job is still in the Running state.

I also see the same symtoms from the occasional grid-monitor job that
doesn't exit after an hour (still running after 24 hours).

Why would condor think the job is still running when the process is dead?

--Mike

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature