[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor-g jobs remain idle for ages



Hi,

I'm having problems with getting condor-g jobs to
run as they seem to remain idle for ages (days at
least in some cases). Doing a condor_q -analyze gives
the dreaded:

1 match, match, but reject the job for unknown reasons

so presumably at least the matchmaking seems to work OK. Having a delve
into the SchedLog reveals this error:

9/5 14:28:47 warning: setting UserUid to 41269, was 286 previosly
9/5 14:28:47 Create_Process: child failed with errno 13 (Permission
denied) before exec() 9/5 14:28:47 StartOrFindGManager: Create_Process
problems!

(UID is that of the condor-g submitting user).

which I take to mean that a shadow process could not be spawned. If I
submit an ordinary (vanilla universe) job then things are fine - they
run almost immediately. Similarly if I try to submit the job on
the central manager (which also has a schedd running) it works fine.

I'm wondering if this could be a system load problem. There are around
100 vanilla jobs running with around 2 600 in the queue *BUT* the jobs
only seem to run for around a minute so new shadows are getting spawned
every few seconds. The schedd is taking ~ 80 % cpu with the long term
load average around 1 (uniprocessor m/c).

Any thoughts on this would be most welcome. I guess the answer is to
move the condor-g submission to a different schedd but I could do without
the extra work.

cheers,

-ian.


-----------------------------------
Dr Ian C. Smith,
e-Science team,
University of Liverpool
Computing Services Department