Thanks but I already knew those tips, the problem
is the need
for contiguous memory for things which means that the
theoretical limits often
can’t be reached (like I said 200 is about our current
Our pool is plenty big enough now (~600 slots and
this does cause issues, especially if two users use the same
schedd by accident.
I am new with
both administrating Condor as well as using it so maybe I
did not understand or
maybe I do not understand Condor's operations with
submitting jobs. What I have
read is that on any Windows submit machine, the maximum
number of jobs that can
be submitted is dictated by the heap size. It should not
make any difference
whether it is VM or not, but we do not have any VM submit
the Condor manual does not discuss, it may also be necessary
to change the heap
size for non-interactive window stations, but this only
effects the job running
on the execute. I have not had the latter problem yet.
The most jobs
that I have concurrently run is about 90 jobs. This is a
small amount and does
not require changing the default heap size for Windows. Our
pool is small so at
the time being we will not exceed this amount.
This is what I
have in my notes (not necessarily correct):
To increase the
maximum number of jobs that can run in the schedd, the
MAX_JOBS_RUNNING should be set and the heap size on the
submit machine (windows
only) must be increased
negotiator requirements: ~10K RAM per batch slot for each
requirements: ~10k RAM per job in the queue. Additional
memory required when
large ClassAd attributes are used for jobs.
500K RAM for each running job. This can be doubled when jobs
submitted from a
32bit machine cannot run more than 10,000 simultaneous jobs
due to kernel
default heap size is 512 Kb. Sixty Condor_shadow daemons use
256 Kb and
therefore 120 shadows will consume the default. To run 300
jobs set the heap
size to 1280 Kb.
I hope this
condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: 07 October 2010 14:48
To: Condor-Users Mail List
Subject: Re: [Condor-users] Wnidows schedd job limits
(Was: RE: Hooks in
the Scheduler Universe?)
> One bare metal, with some exceptional nice
hardware, I've gotten it up
to 150 running jobs/schedd with
> 4 schedds on the box. And even that feels
meta-stable at times. The
lightest of breezes pushes off balance.
> This is Win2k3 Server 64-bit. AFAIK there are no
registry tweaks to
the image. What tweaks are you making?
Apparently we don't apply the desktop heap tweaks on the
perhaps that only applies to XP or 32bit machines.
What was the most you managed to achieve on one schedd
>> Job hooks
>> In the hive mind opinion should I not consider
even testing using
job hooks (for replacement of schedd/negotiator) on
windows right now?
> Well, Todd closed that ticket. I swear it's never
worked in 7.4.x for
me but I have
> retest now and confirm this. It can't hurt to try
it. But you loose so
much with hooks for pulling jobs.
> You have to do your own job selection, or if you
let the startd reject
jobs, you have to have a way to pull
> and then put back jobs that have been rejected,
which is inefficient
and difficult to architect such that it
> works well when you've got a *lot* of slots trying
to find their next
job to run.
> I'll admit this exactly what I had working at
Altera but it was a good
year plus of work to get it functioning.
Yeah I figured that was tricky. Though I would have the
machines would never need to reject (except as a failure
If I did it I'd likely do it as:
A fixed set of queues (Q => max queues) where a job
could be in multiple
queues at once albeit with different ranking. These queues
define a strict
ordering of relative ranking for any individual slot
Each queue is actually broken down into per user (U
=> max users) queues
within it. This is entirely transparent to the below
except where noted.
Every slot would map to a (stable) ordering over the
In a steady state a freshly available slot would take
the job at the top of
the first queue that had a job for it.
In start up/sudden surges of multiple machines available
I would do the
same as steady state but order the process by the relative
speed of the machine
(we are happy for this to be a simple assigned strict
Evaluation of any slot would then be O(QU) where Q is
the number of
distinct queues. But in almost all cases the queue being
non empty => the
job goes to that slot and the process stops. A job being
taken by a slot would
eliminate it from other queues. At the point a queue is
selected I can do some
simple user level balancing (likely simply attempting to
ensure that the user
with the least active jobs gets to go next, I can always
add in the user
halflife if needed)
I would have a central server constantly maintaining the
top N of each
queue/user pair (where N is some number a bit bigger than
the max slots ever
available) and so I would only need to do serious
Sorting effort on the queue on that being drained. Any
events on a job
(including insertion of a new one) could only cause a scan
in the top N (O(QUN)
in worst case) for comparisons to see if it goes in there,
if not it is dropped
into the 'to place bucket'
On neededing to repopulate the top N I can assume that
the ordering of the
bottom is still good, I just need to merge in the 'to
place ones' which can be
done on a separate thread if need be, and in most cases
the ordering will be
I doubt I would need terribly complex lock free
structures. All lock
operations would last only as long as was required to
(insert/update/aquire) a single job/slot unless the 'to
place' buckests start
to overfill in which case a brief period of allowing only
read only access to
clear them out may be required. If need be by taking a per
job lock for
assignment (locking protocol queue -> job) then I can
deal with a job being
taken by one slot whilst involved in an operation in the
other by discard and
recurse on encountering an orphan. I doubt I would bother
with this to start
with, accepting that incoming events will have to buffer
till after the current
assignment pass is not that big a deal.
The in memory structures need not be replicated to disk,
On a restart they
can be reread and reconstructed, as there will be a
natural ordering in most
cases that can be used when querying the backing database
for state to place
most queues into 'almost sorted' if not actually perfectly
simplifies things and should give it relatively rapid
recovery, not to mention
a very simple failover mode.
Submission becomes as simple as inserting into the
database followed by a
'hint ping' to the server process to take a look for fresh
Pre-emption becomes the only tricky part. But since that
is not likely to
be an issue in the steady state phase we need only run a
separate thread to
sniff for possible pre-emption targets (only needed for
slots running jobs not
on their best queue) which triggers pre-emption (and
remembers that it has done
so) to allow the slot to do its request for a new job
(which will then fulfil
its optimal job)
This all only works within the sort of constrained, and
transitive ordered system. But since that is how our
system has actually been
for the past 5 years I see no reason to think it will
change. Exploiting the
steady state characteristics should allow me to do an
extremely fast set up
with very little delay and absolutely no need to query the
server for anything
but the actual 'get me a new job' phase and to trigger a
recheck of the
database. All submission/modification/querying of the jobs
goes directly to the
I would need to add security to the db
insert/modification so that only
user X could insert/update rows 'owned' as themselves.
Obviously super users
could delete anyone.
Many of the things condor does to deal with
streaming/lack of shared
filesystem could be utterly ignored for my purposes,
submission would therefore
be much simpler (and transactional to boot).
In all probability I wouldn't be doing this, someone
else would. But I can
get a rough idea of the complexity involved from the
>> Multiple per user
daemons per box
>> I doubt this would actually improve things
> It does assuming you've got enough CPU so the
shadows don't starve on
> That's one area where I notice Windows falling down
if you've got a high rate of shadows
> spawning it seems to have a lot of trouble getting
shadow<->startd comm setup and the job started.
> Lately I've been running with a 2:1 ratio of
That's worth knowing thanks.
>> Also not clear if anyone uses this heavily on
> All the time.
Right ho, still an idea then
>> Remote submit to linux based
>> submission is ultimately a bit of a hack, and
client side to do a lot more state checking
> I have mixed feelings about remote submission. It
tricky when you're mixing OSes.
> I've had better luck with password-free ssh to
submit jobs to
centralize Linux machines.
> And the SOAP interface to Condor is another
approach I've had better
experience with than remote submission from Windows to
Linux. Not to say it
can't work, just that I've found it tricky.
I hadn't considered using SOAP, it seemed like it didn't
performance of raw condor_submit access when I last tried
it but that was long
ago. Also I'm not sure how well (if at all) it supports
would be a deal breaker.
> Another option is to run a light-weight "submitter"
on the scheduler that you have a custom submit command
talk to over
> REST or SOAP and it, in turn, is running as root so
it has the ability
to su to any user and submit as them, on that machine.
> Might be easier than ssh.
We already run such a daemon per user (it basically
reads from and
maintains a database and does the submit/queue
maintenance). The problem is
that, it is all in c# (with considerable backing libraries
I don't fancy trying
to port to mono). But that may well be the best option,
migrate the daemon to
mono, use it to submit jobs and give everyone their own
virtual submit server
(a bit cheaper per VM at least).
Thanks for the ideas.
Gloucester Research Limited believes the information
provided herein is
reliable. While every care has been taken to ensure
accuracy, the information is
furnished to the recipients with no warranty as to the
accuracy of its contents and on condition that any errors
or omissions shall
not be made the basis for any claim, demand or cause for
The information in this email is intended only for the
recipient. If you are not the intended recipient please
immediately and do not copy, distribute or take action
based on this e-mail.
All messages sent to and from this email address will be
Gloucester Research Ltd and are subject to archival
storage, monitoring, review
Gloucester Research Limited, 5th Floor, Whittington
House, 19-30 Alfred
Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in
England and Wales
with company number 04267560.
Condor-users mailing list
To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: