[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Maximum jobs on submit machine



Try some Google searches, I didn't get anything in the first hit, but this
additional tip:

http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r1/index.jsp?topic=/com
.ibm.swg.im.iis.productization.iisinfsv.install.doc/topics/wsisinst_config_
winreg.html

I suggest trying 2048, 4096, etc. to see if this helps get you more jobs.

~B

On 1/18/12 5:12 PM, "Eric Abel" <Eric.Abel@xxxxxxxxxx> wrote:

>Thanks for the tip.  I changed the SharedSection value from 512 to 1280
>following the instructions on the link you provided, and now the number
>of jobs seems to peak at about 110.  However, I am not able to go much
>higher...is there a maximum to the value SharedSection can have?
>
>Eric
>
>-----Original Message-----
>From: condor-users-bounces@xxxxxxxxxxx
>[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Gore, Brooklin
>Sent: Friday, January 13, 2012 11:49 AM
>To: Condor-Users Mail List
>Subject: Re: [Condor-users] Maximum jobs on submit machine
>
>Eric,
>
>While your maximum jobs running (50-85) is a bit lower than the 120
>usually associated with the Windows HEAP size issue, it could be related.
>
>Check the last article here:
>http://research.cs.wisc.edu/condor/manual/v6.8/7_4Condor_on.html
>
>A silly question: There are more than 50-85 machines available to actually
>run these jobs, right?
>
>Best, ~Brooklin
>
>On 1/13/12 10:17 AM, "Eric Abel" <Eric.Abel@xxxxxxxxxx> wrote:
>
>>Lukas, Micheal, Matthew, and Mark,
>>
>>Thank you for your responses.  I will respond to all of you in a single
>>email if possible.
>>
>>First, this is a windows pool.  The problem I am having is a maximum
>>number of jobs running concurrently on a submit machine.  All of the
>>execute machines are capped at the number of available CPU's, and they
>>are working fine.  Like most places, each machine is set up with an
>>anti-virus software, in this case Symantec.  The anti-virus utility is
>>set up to handle the firewall, so windows firewall is disabled.  I have
>>had to get IT to enable exceptions for all condor processes.  I have been
>>running the pool for about 8-9 months now, but only recently have I
>>recruited enough CPU's for this problem to surface.
>>
>>I have validated that the MaxJobsRunning value is not the limiter by
>>setting its value first to 30, which definitely capped the number of
>>running jobs at 30, then setting it to 2000, in which case the number of
>>jobs simply floated to its maximum which are the 85 and 50 that I
>>initially reported.
>>
>>Mark, if I were to temporarily disable Symantec, then this would test
>>whether or not it's a firewall issue, correct?
>>
>>Thank you all for your ideas.  Hopefully we can find a resolution here.
>>
>>Eric
>>
>>
>>-----Original Message-----
>>From: condor-users-bounces@xxxxxxxxxxx
>>[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Lukas Slebodnik
>>Sent: Friday, January 13, 2012 8:03 AM
>>To: Condor-Users Mail List
>>Subject: Re: [Condor-users] Maximum jobs on submit machine
>>
>>On Fri, Jan 13, 2012 at 10:49:32AM -0500, Matthew Farrellee wrote:
>>> On 01/13/2012 10:22 AM, Eric Abel wrote:
>>> >Fellow condor users,
>>> >
>>> >I am finding that there is a limit to the number of jobs that will run
>>> >on a given submit machine, and that number is different depending on
>>>the
>>> >machine. I have already verified that this limit is well below the
>>> >default MaxJobsRunning value. For example on one machine the maximum
>>> >seems to be about 85, and on another it¹s about 50. Any ideas on this?
>>> >
>>> >Thanks,
>>> >
>>> >Eric
>>> 
>>> [MAX_JOBS_RUNNING]
>>> default=ceiling(ifThenElse( $(DETECTED_MEMORY)*0.8*1024/800 < 10000,
>>> $(DETECTED_MEMORY)*0.8*1024/800, 10000 ))
>>> 
>>> So the MaxJobsRunning is a function of RAM in the box. If you're on
>>> Windows it is more complicated. Generally, I recommend using a
>>> non-Windows machine for hosting the condor_schedd.
>>
>>You can view values for all schedd daemons by executing command
>>condor_status -sched -f "%s " Name -f "%s\n" MaxJobsRunning
>>
>>On Windows platforms, the number of running jobs is capped at 200.
>>A 64-bit version of Windows is recommended in order to raise the value
>>above
>>the default.
>>
>>Details:
>>http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#1825
>>3
>>
>>Regards,
>>Lukas
>>
>>> 
>>> Best,
>>> 
>>> 
>>> matt
>>_______________________________________________
>>Condor-users mailing list
>>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>subject: Unsubscribe
>>You can also unsubscribe by visiting
>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>>The archives can be found at:
>>https://lists.cs.wisc.edu/archive/condor-users/
>>_______________________________________________
>>Condor-users mailing list
>>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>subject: Unsubscribe
>>You can also unsubscribe by visiting
>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>>The archives can be found at:
>>https://lists.cs.wisc.edu/archive/condor-users/
>
>_______________________________________________
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>subject: Unsubscribe
>You can also unsubscribe by visiting
>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>The archives can be found at:
>https://lists.cs.wisc.edu/archive/condor-users/
>
>
>_______________________________________________
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>subject: Unsubscribe
>You can also unsubscribe by visiting
>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>The archives can be found at:
>https://lists.cs.wisc.edu/archive/condor-users/
>