[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Strange scheduling behavior in 6.8.0



On Wed, Aug 16, 2006 at 10:40:45AM -0700, Michael S. Root wrote:
> 
> If I run "condor reschedule -all", it will 
> send the "Reschedule" command to only those 10 or so machines that are 
> actually running jobs.
>

condor_reschedule doesn't talk to execute machines - it only talks to the
schedd. Furthermore, condor_reschedule -all is not useful.

condor_reschedule sends a command to the schedd that says 'please start
a matchmaking cycle in the pool.'. The schedd in turn contacts the central
manager and says "please start a matchmaking cycle in pool." - so
'condor_reschedule -all' means "send a message to all the schedds in the pool
to ask them all to contact the central manager and ask for another matchmaking
cycle in the pool. Therefore, if you have N schedds, and you use 
'condor_reschedule -all', your central manager gets N-1 more requests than it
needs to start a new matchmaking cycle in the pool. 

How long are you waiting while jobs are "stuck?" If after 5 minutes, you
give a single 'condor_reschedule' (without the -all), do they get unstuck?

-Erik