[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Delay on submit, and other newbie issues



On Mon, Jun 23, 2008 at 10:43 AM, Ira Abramov <condor@xxxxxxxxxxxxxxx> wrote:
> Quoting Matthew Farrellee, from the post of Mon, 23 Jun:
>> > 1. when I submit a job it only starts running in a slot some 10-20
>> > seconds later. is that the 30 second interval for matching? can I set
>>
>> Yes, the NEGOTIATOR_INTERVAL. You can make it shorter, but when you have
>> hundreds of thousands of jobs in your system later I'd not recommend it.
>
> It's not that kind of workload, we'll be running 8-12 parallel jobs at
> the most on our 16 current slots, we just want them to always run on the
> most available and non-disruptive slot.
>
> I've shorten it to 3 sec, and it still seems like it's 10-15 seconds
> until a job starts. I'll clock it more closely and see what I find.
The NEGOTIATOR_CYCLE_DELAY is the culprit for this. If you should this
from its default of 20 seconds, you should be able to tighten this
further. The only thing to be careful of is when you see matches that
were just being made that get broken in the next Negotiation Cycle
(look at the NegotiatorLog). That means you've set the numbers too
low, and need to update them.

>> > 2. when I use "condor_q" I can see the job is running but not which slot
>> > was allocated for it, and could not  find a switch to add such a
>> > coloumn. what have I missed?
>>
>> condor_q -run
>
> ahh.. nice..  however I see that I can only see jobs I submitted on the
> same machine by default...
> wait, found it... condor_q -global. Can I make it the default somehow to
> do I need to wrap that in an alias?

You can actually wrap 'condor_q -global' along with '-format "%s\t"
ClusterID -format "%s\t" ProcID -format "%s\t" ...any other classad
attributes you want to expose... -format "\n" Name'
This will enable you to display any attributes you want.

>> > up a single core anyway. I could not find anywhere to define jobs that
>> > may parallelize (like "make" forking two compilers or a JVM splitting to
>> > threads) and how to tell about them to the manager
>>
>> Check out the Parallel Universe
>
> ahh, yes, time to figure out those universe notions :) I'll look into
> it.
There are some consequences to the Parallel universe, so make sure
your use case requires it.

>> I'll let someone else field the FlexLM question in more detail.
>
> Thanks for all your help so far!
So I have a tweak to Ian's e-mail that might be useful. Ian's idea of
having one scheduler clearly works, and is simple to understand. If
load becomes an issue with the one scheduler, you can use one
scheduler for each application that has license restrictions, using
the MAX_JOBS_RUNNING setting to limit the number of jobs "running"
that would use that application. You can set up a schedd with the
SCHEDD_NAME= RenderMan, and submit to condor_submit -n
RenderMan@xxxxxxxxxxx

Good luck,
Jason

-- 
===================================
Jason A. Stowe
cell: 607.227.9686
main: 888.292.5320

Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com