Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Slow Performance

Date: Sun, 27 Apr 2014 12:52:26 -0500
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Slow Performance

On 4/27/2014 10:22 AM, Dennis Zheleznyak wrote:

My question may not be connected directly to Condor but I'd like to know if
anyone encountered the same issue as me.

I bought a a Dell 720xdserver with an x2 Intel E5-2660 v2 CPUs, 256GB DDR3
and 40TB of data that has a RAID6 over it. with HyperThreading it has 40
cores. It has Windows Server 2012 on it.

My program isn't build with MPI capabilities, it calculates data from an
input file and outputs to a file once it is done - the program was compiled
with MatLab.

Normally I have 150 sets of data to be caulates. When I send it to condor
40 jobs start and that's great - the problem is that it takes forever to
finish a even one simple little job! The CPU is constantly working at 100%,
the memory barely gets to 10% and there is no special IO on the disks that
I can mention.

Before I bought the server, I had 4 computers with i7 4770K Haswell and
16GB of memory - the jobs literary flew when I sent it to my condor pool !

I don't know what to check or do - if anyone has any idea I would
appreciate it.

Thank you,
Dennis.


A few random first-thought suggestions:

1. Are you compiling and running with the -singleCompThread command-lineargument to MATLAB? From how you have things setup about, you will want-singleCompThread so that MATLAB only uses a single core, else MATLABwill startup and each of your 40 jobs will try to use all 40 cores!Even if this was happening with your old servers, the issue will becomemuch more pronounced on a machine with more cores. See

  https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToRunMatlab
for this and other tips.

2. I would suggest a quick experiment - try running withouthyperthreading and see if that improves things. Even if it doesn't, atleast you eliminated a possible issue. To do so, in thecondor_config.local for that machine set

  COUNT_HYPERTHREAD_CPUS = False

and then restart HTCondor. Specifically, you just need to restart thecondor_startd, so you could do

  condor_restart -startd <machine-name>

from your central manager. When HTCondor restarts, you will see lessslots as HTCondor will only count physical cores, not hyperthread cores.Resubmit your jobs and see what happens.

3. Another suggestion - if it is easy, what happens if you start 40 runsof your job simultaneously outside of HTCondor? We expect things willbe equally slow outside of HTCondor, but it would be a nice data pointto confirm this.


regards,
Todd

References:
- [HTCondor-users] Slow Performance
  - From: Dennis Zheleznyak

Prev by Date: [HTCondor-users] Slow Performance
Next by Date: [HTCondor-users] running parallel applications
Previous by thread: [HTCondor-users] Slow Performance
Next by thread: [HTCondor-users] running parallel applications
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Slow Performance