[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Slow Performance



Todd, Thank you for the great response !Â

1. Since we are using Matlab2012a, the option of usingÂmaxNumCompThreads was still available to us(it is deprecated in the newer versions). We compiled three programs(1 core, 4 cores and 10 cores) and tested it, our conclusion was that using 4 cores gave us the best performance of the three.
We then disabled HyperThreading in the BIOS, this alone gave us a boost in performance of 50%. In addition to HT, I followed Dell's tuning guide and it helped us a bit too.
Finally, we all the recompiled programs with condor - this time I tried limiting condor to 10 or 20 cores by settings the NUM_CPUS= parameter in condor_config.local.
The conclusion was that the 4 cores compiled program and 10 cores in condor was the best combination - we are running a big test as we speak. I'll keep you updated on Sunday when I see the results. Â

2. Since I disabled HT in the BIOS after noticing that it gave me a boost in performance I didn't need this parameter, however it is good to know ;)Â

3. Same behavior - it was the first thing I tried to rule out Condor.Â

Thank you Thank you Thank you,
Dennis.Â



On Sun, Apr 27, 2014 at 8:52 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
On 4/27/2014 10:22 AM, Dennis Zheleznyak wrote:
My question may not be connected directly to Condor but I'd like to know if
anyone encountered the same issue as me.

I bought a a Dell 720xdserver with an x2 Intel E5-2660 v2 CPUs, 256GB DDR3
and 40TB of data that has a RAID6 over it. with HyperThreading it has 40
cores. It has Windows Server 2012 on it.

My program isn't build with MPI capabilities, it calculates data from an
input file and outputs to a file once it is done - the program was compiled
with MatLab.

Normally I have 150 sets of data to be caulates. When I send it to condor
40 jobs start and that's great - the problem is that it takes forever to
finish a even one simple little job! The CPU is constantly working at 100%,
the memory barely gets to 10% and there is no special IO on the disks that
I can mention.

Before I bought the server, I had 4 computers with i7 4770K Haswell and
16GB of memory - the jobs literary flew when I sent it to my condor pool !

I don't know what to check or do - if anyone has any idea I would
appreciate it.

Thank you,
Dennis.


A few random first-thought suggestions:

1. Are you compiling and running with the -singleCompThread command-line argument to MATLAB? From how you have things setup about, you will want -singleCompThread so that MATLAB only uses a single core, else MATLAB will startup and each of your 40 jobs will try to use all 40 cores! Even if this was happening with your old servers, the issue will become much more pronounced on a machine with more cores. ÂSee
 https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToRunMatlab
for this and other tips.


2. I would suggest a quick experiment - try running without hyperthreading and see if that improves things. ÂEven if it doesn't, at least you eliminated a possible issue. ÂTo do so, in the condor_config.local for that machine set
 COUNT_HYPERTHREAD_CPUS = False
and then restart HTCondor. ÂSpecifically, you just need to restart the condor_startd, so you could do
 condor_restart -startd <machine-name>
from your central manager. ÂWhen HTCondor restarts, you will see less slots as HTCondor will only count physical cores, not hyperthread cores. Resubmit your jobs and see what happens.

3. Another suggestion - if it is easy, what happens if you start 40 runs of your job simultaneously outside of HTCondor? ÂWe expect things will be equally slow outside of HTCondor, but it would be a nice data point to confirm this.

regards,
Todd

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/