[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Lazy scheduler ?

Hello Condor friends,

my pool of Condor 6.6.8 linux (SuSE) machines is not doing well :
with the default value for NEGOTIATOR_TIMEOUT (30 sec), only two machines are claimed and busy.
I increased the timeout to 300 sec : now every available nodes are claimed (16 less one pair, perhaps a config problem), but after some successful jobs, the queue seems to freeze.
It is as if the scheduler succeeds once and get stuck later on.

condor_q shows the correct number of running jobs (queue of 5400 jobs)
condor_status indicates that the various nodes are 'claimed' but 'idle',

Only a few jobs were executed this night.
The network/NFS status seems in good state.

I am never sure to have all my nodes claimed and busy, even before upgrading to 6.6.8.

	Do you have any suggestion
Dr Alain EMPAIN  <alain.empain@xxxxxxxxx> <alain@xxxxxxxxxx>
      Bioinformatics, Molecular Genetics,
      Fac. Med. Vet., University of LIEGEe, Belgium
      Bd de Colonster, B43   B-4000 LIEGEe (Sart-Tilman)
WORK: +32 4 366 4159         FAX: +32 4 366 4122
HOME: rue des Martyrs,7      B- 4550 Nandrin
      +32 85 51 2341         GSM: +32 497 70 1764
-- If you have problems in Windows: REBOOT
-- If you have problems in Linux:   BE ROOT