[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Suspending other jobs when a privileged user submits a new job

> Hi Matt,
> 	Thanks for your reply.
> matthew.hope@xxxxxxxxx said:
> > I fear you have misunderstood the capabilities of the 
> SUSPEND functionality.
> > It does not allow you to free up the slot for use by 
> another job. Only to stop
> > a job doing anything for a while (the original idea 
> presumably being that if
> > someone wants to use the machine interactively they can be 
> largely unaffected
> > by the job without totally killing it). 
> 	Hmmm. So you're saying that the job will be suspended, but it
> will continue to occupy a slot?
> 	I'll take a look at multiple slots and see what I can do.  Is
> there no other way to ask condor to overcommit a processor (i.e.,
> have two jobs -- in my case, one suspended -- assigned to the same 
> processor)?

There still seems to be a little confusion. Maybe this will help:

* The number of "slots" seen by Condor can be whatever you like:
  Defaults are
  o typically 1 per processor core
  o this gets double if hyperthreading is enabled
  You can affect these slots:
  o tell condor not to consider hyperthreading when allocating slots
  o for multi-core processors, setup a complicated group of overlapping
    slots (see previous posts on this).
    e.g. for a Quad, 4GB RAM, pretend you have a single 4GB RAM proc, 2x 2GB procs
         and 4x 1GB proc - and then do some smart config setup so that certain 
         combinations of these is disallowed.
  o Declare (say) twice as many slots per processor so more jobs run concurrently
    (you might know that jobs spend a long time in I/O for instance and your tests 
     have shown this gives better throughput).

* When non-condor activity happens on machine, you can configure condor to
  behave in a variety of waves by tuning values such as SUSPEND and PREEMPT.
  These values will allow you to specify things like:
  * Don't give user priority and continue regardless
  * suspend job (I believe this will "swap" job out of memory
  * kill job (allowing it to be restarted elsewhere)

I hope this helps, although it didn't directly answer any of your questions