[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] COLLECTOR_PERSISTENT_AD_LOG huge file and collector restart



Hi,

 

On our cluster, COLLECTOR_PERSISTENT_AD_LOG was configured to point at /var/log/condor/AbsentLog

 

Seeing issues on some machines unable to contact the collector, I decided to restart it… and things started failing. Condor stopped responding on the collector VM. Impossible to restart condor.

Kill -9 required on collector to really stop it…

 

I finally figured out the following  :

 

# du -h /var/log/condor/AbsentLog

17G     /var/log/condor/AbsentLog

 

=> condor stop + kill -9 the collector, restarted condor, and voilà : collector was back up and running in a few seconds.

 

Our cluster is 287 machines big… is this expected to get such a huge file that apparently severely impacts the collector restart ?

Or is this an ever-growing file that sometimes must be cleaned up ?

 

Thanks