[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[condor-users] Speeding up condor_submit (was Speeding up DAGman submits)
- Date: Mon, 10 May 2004 16:30:29 -0500
- From: Alain Roy <roy@xxxxxxxxxxx>
- Subject: [condor-users] Speeding up condor_submit (was Speeding up DAGman submits)
we are submitting a small DAG with very few jobs to DAGman (test jobs such
as ls) and noticed that it takes about 20-25 seconds for the CONDOR job
that DAGman submits to the queue to go from the idle to running state.
I changed the subject of the message because the issue is with
condor_submit, not with DAGMan. (By the way, it's Condor, not CONDOR--it's
not an acronym.)
I've tried to figure out which condor_config file options affect this
time (by reading the comments in condor_config and the manual section
defining all the parameters), but haven't had much luck.
There is no magical "speed up Condor" option. :)
First, let me recommend reading a short post made to condor-users a while
ago by Doug Thain:
In part, he says:
Please keep in mind that Condor is a high *throughput* system designed to
execute large workloads over long time periods. It is *not* designed to be
a low latency system that executes a single job quickly. Condor performs a
large number of expensive operations in order to maximize scalability and
reliability at the expense of latency.
Take this to heart. Condor is targeted at high throughput, not high
performance. Condor is not tuned to start up jobs in seconds. If you need
reliability and scalability, Condor is a good match.
1) using the 'test job feature' for fast turnaround time. Can this be
applied to DAGman jobs?
What test job feature are you referring to?
2) Computing on Demand.
You're right--this is not a good match with DAGMan.
Here are some factors that affect the speed of starting up a job:
1) Are there lots of jobs or computers? Deciding where a job should run
requires matchmaking. The more jobs or computers you have, the slower this
process is. It is possible to speed up this process in some cases when you
have lots of jobs in your queue. Holler if this is related.
2) The matchmaking cycle runs every five minutes, except when jobs are
submitted. When you submit a job, it will start a new matchmaking cycle as
soon as it can (perhaps it's already in the middle of matchmaking) unless
it started a matchmaking cycle within the last 20 (25?) seconds. This
number is tunable, but the point is that matchmaking doesn't happen
3) The time to actually start up your job. This can be affected by all the
usual suspects: your network, the computer, how much data needs to be
transferred to the computer (do you transfer files?), the speed of shared
file systems like NFS, etc.
In general, we recommend running a few large jobs rather than many small
jobs. You will get better throughput, and the bumps in performance (like 30
second startup times) won't matter so much.
If you really need interactive startup, then COD is the way to go, but COD
doesn't mix with DAGMan well.
Some collaborators at Technion in Israel have been working on low latency
invocation in Condor. I'm not sure of the status of their work, but you
might want to talk to them if it's important enough to you.
I hope this helps.
Condor Support Information:
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>