Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Strange scheduling behavior in 6.8.0

Date: Wed, 16 Aug 2006 18:16:05 -0500
From: Erik Paulson <epaulson@xxxxxxxxxxx>
Subject: Re: [Condor-users] Strange scheduling behavior in 6.8.0

On Wed, Aug 16, 2006 at 10:40:45AM -0700, Michael S. Root wrote:
> 
> If I run "condor reschedule -all", it will 
> send the "Reschedule" command to only those 10 or so machines that are 
> actually running jobs.
>

condor_reschedule doesn't talk to execute machines - it only talks to the
schedd. Furthermore, condor_reschedule -all is not useful.

condor_reschedule sends a command to the schedd that says 'please start
a matchmaking cycle in the pool.'. The schedd in turn contacts the central
manager and says "please start a matchmaking cycle in pool." - so
'condor_reschedule -all' means "send a message to all the schedds in the pool
to ask them all to contact the central manager and ask for another matchmaking
cycle in the pool. Therefore, if you have N schedds, and you use 
'condor_reschedule -all', your central manager gets N-1 more requests than it
needs to start a new matchmaking cycle in the pool. 

How long are you waiting while jobs are "stuck?" If after 5 minutes, you
give a single 'condor_reschedule' (without the -all), do they get unstuck?

-Erik

References:
- [Condor-users] Strange scheduling behavior in 6.8.0
  - From: Michael S. Root

Prev by Date: Re: [Condor-users] ERROR starting jobs: Jobs get evicted fror unknown reason (108)
Next by Date: Re: [Condor-users] Debugging with cmd.exe
Previous by thread: [Condor-users] Strange scheduling behavior in 6.8.0
Next by thread: [Condor-users] Restarting completed dag jobs does not work anymore with 6.8.0
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Strange scheduling behavior in 6.8.0