[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Can you call condor_reschedule too frequently?
- Date: Wed, 15 Dec 2004 17:25:59 +0000
- From: matthew hope <matthew.hope@xxxxxxxxx>
- Subject: Re: [Condor-users] Can you call condor_reschedule too frequently?
On Wed, 15 Dec 2004 11:28:01 -0500, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
> after submitting the jobs and before entering
> the Condor monitor loop. Should I worry about stress on the master? Can
> anyone comment on why jobs are taking a long time to match when
> submitted from windows?
I'd be more worried about what the 'monitor loop' is doing...
if the submitter machine is overloaded either generally or from too
many demands on the schedd's (single threaded) time the other daemons
in the farm will timeout their requests (such as those from the
What is your monitor loop doing - using condor_wait? calling condor_q
every so often? condor_history? scanning the job log?
the submission of new jobs normally seems to trigger a reschedule
(anecdotal evidence) however the *release* of a job doesn't - are you
submitting on hold then releasing as part of your scripted solution (I
noticed this when I wrote a c# wrapper round the command line)...
Is the negotiation machine overloaded / taking too long going through
processes which can't run anyway...
Note that their appears to be a hard coded 15 second lower bound on
interval between negotiation.
Take a look in your negotiation logs and you should get some clues as
to why it takes so long.
I find that, with significant number of submitters the farm is never
going to spring into life since the overhead of going round all the
schedd's by the negotiator will always add a little (and sometimes a
lot) of latency in the order of a few minutes. If this bugs you you
may as well get used to it, radically shrink the number of submitters
or use something different. I didn't write the system but given it's
operational goals/history (big farms, non heterogeneous, cycle
stealing roots - hours / days / months worth of jobs) and my
perception of the architecture it uses* I see no way for it to avoid
this latency on initiating a match/claim...
You could layer your own schedulers on top and thus permanently
maintain the match and manage your own submission process - I don't
think this will gain you much for the (massive) hassles it will cause.
* this isn't being bitchy - it trades off some latencies for
potentially significant throughput increases in a complex less well
controlled environment a reasonable decision for its target. I think
it might benefit from a more streamlined mode too