[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] slow submission rate



On 8/2/2013 9:09 AM, Dan Bradley wrote:

Be aware that turning off fsync in the condor_schedd can lead to loss of
job state in the event of power loss or other sudden death of the
schedd.  This could result in jobs that were submitted shortly before
the outage disappearing from the queue without being run.  It could also
result in jobs being run twice.

If that is acceptable for your purposes, then your problem is solved. If
it is not acceptable, then focus on improving the performance of the
filesystem containing $(SPOOL).


FWIW, on our busy submit nodes (dozens of users with typically thousands of running jobs), we put $(SPOOL) on a solid-state drive (SSD). Specifically, we mount the SSD on /ssd and then put in condor_config:
   JOB_QUEUE_LOG = /ssd/condor_spool/job_queue.log
The above allows us to put the job_queue.log onto the SSD - this is the schedd's jobs queue and the file that gets a lot of fsyncs on transaction boundaries. By using JOB_QUEUE_LOG, we can use a small/cheap SSD that does not have to be large enough to hold the entire contents of the $(SPOOL) directory.

Performance is greatly improved and the risks Dan outlines above are avoided.

-Todd


--Dan

On 8/2/13 8:14 AM, Pek Daniel wrote:
Thanks, the FSYNC trick solved the issue! :)


2013/8/1 Dan Bradley <dan@xxxxxxxxxxxx <mailto:dan@xxxxxxxxxxxx>>


    Are you timing just condor_submit, or are you also timing job
    run/completion rates?

    Job submissions cause the schedd to commit a transaction to
    $(SPOOL)/job_queue.log.  If the disk containing that is slow,
    submissions will be slow.  One way to verify if this is the
    limiting factor is to add the following to your configuration:

    CONDOR_FSYNC = FALSE

    Another thing to keep in mind is that if you can batch submissions
    of many jobs into a single submit file, there will be fewer
    transactions.

    --Dan


    On 8/1/13 10:17 AM, Pek Daniel wrote:

        Hi!

        I'm experimenting with condor: I'm trying to submit a lot of
dummy
        jobs with condor_submit from multiple submission hosts
        simultaneously.
        I have only a single schedd. I'm trying to stresstest this
schedd.
        These jobs are in the vanilla universe.

        The problem is that I couldn't reach better result than 4-6
        submission/sec, which seems a little low. I can't see any real
        bottleneck on the machine, so I suspect that it's because of some
        default value of a configuration option which throttles down the
        submission requests.

        Any idea how to solve this?

        Thanks,
        Daniel


    _______________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to
    htcondor-users-request@xxxxxxxxxxx
    <mailto:htcondor-users-request@xxxxxxxxxxx> with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

    The archives can be found at:
    https://lists.cs.wisc.edu/archive/htcondor-users/




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685