[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor-g jobs remain idle for ages



--On 05 September 2006 14:41 +0100 "Dr Ian C. Smith" <i.c.smith@xxxxxxxxxxxxxxx> wrote:

Hi,

I'm having problems with getting condor-g jobs to
run as they seem to remain idle for ages (days at
least in some cases). Doing a condor_q -analyze gives
the dreaded:

1 match, match, but reject the job for unknown reasons

so presumably at least the matchmaking seems to work OK. Having a delve
into the SchedLog reveals this error:

9/5 14:28:47 warning: setting UserUid to 41269, was 286 previosly
9/5 14:28:47 Create_Process: child failed with errno 13 (Permission
denied) before exec() 9/5 14:28:47 StartOrFindGManager: Create_Process
problems!

(UID is that of the condor-g submitting user).

which I take to mean that a shadow process could not be spawned. If I
submit an ordinary (vanilla universe) job then things are fine - they
run almost immediately. Similarly if I try to submit the job on
the central manager (which also has a schedd running) it works fine.

I'm wondering if this could be a system load problem. There are around
100 vanilla jobs running with around 2 600 in the queue *BUT* the jobs
only seem to run for around a minute so new shadows are getting spawned
every few seconds. The schedd is taking ~ 80 % cpu with the long term
load average around 1 (uniprocessor m/c).

Any thoughts on this would be most welcome. I guess the answer is to
move the condor-g submission to a different schedd but I could do without
the extra work.

Postscript:

queue is now empty apart from condor-g job but it stays idle and I'm getting
the same "Create_Process problems!" error. vanilla universe jobs seem fine.
-ian.