[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Not running Parallel-universe jobs?
- Date: Mon, 24 Aug 2015 23:34:49 +0000
- From: "Seering, Adam" <aseering@xxxxxx>
- Subject: [HTCondor-users] Not running Parallel-universe jobs?
I have inherited, and am trying to maintain, a condor cluster. It was
working nicely on its own for a while. But we recently had some power
outages that corrupted some client machines. Ever since, we've had a
periodic problem where Parallel-universe jobs just won't run. I have an
example right now (hostnames and job criteria omitted; let me know if
they are relevant):
$ condor_q -analyze 9800
-- Submitter: <IPs, hostnames, etc>
9800.000: Run analysis summary. Of 308 machines,
260 are rejected by your job's requirements
1 reject your job because of their own requirements
41 match but are serving users with a better priority in the pool
0 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 match but are currently offline
6 are available to run your job
It says that 6 machines are available to run this job. But it has been
sitting there for over 20 minutes in the "Queued" state. Other jobs
have been sitting there for almost a day.
If I submit a Vanilla-universe job, it will run right away.
I have machines; I'd like these jobs to run on them. What am I
In case it's relevant, the condor server itself is somewhat older; it
self-reports as running condor 7.6.6. Most clients are running condor