[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] jobs stop running when lots of people submit jobs
- Date: Tue, 20 May 2008 12:15:59 +0000 (GMT)
- From: Ben Clifford <benc@xxxxxxxxxxxxx>
- Subject: [Condor-users] jobs stop running when lots of people submit jobs
I have a condor installation which I use for training.
Sometimes when in use, it stops running jobs, with those jobs appearing
2 match but reject the job for unknown reasons
When I attempt to put load on a fresh installation, both with condor jobs
and with non-condor jobs, both from my own account and from several
accounts at once, I cannot get this problem to reappear; but as soon as
students start using it, the problems start (even to the extent that my
test load scripts will be running in a loop happily for hours and then
stop around the time students start)
So the only mechanism I have for recreating it at the moment is to point a
room full of students at it (which is not an easily repeatable action).
This has happened a few times, but I now have an install that is in this
state and still online rather than being taken down right after a
This is using condor-6.8.4. Condor-G works OK submitting to Globus on
other machines, but local execution through the vanilla universe does not
(using a variety of submission mechanisms - through GRAM2, through
condor_run, condor_submit, dagman).
I don't see anything in the logs that indicates what is causing this
problem - does anyone have any advice about what I can look for?