[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Collector using a lot of CPU



Replying to my own e-mail.  We discovered today that the /etc/hosts file
was missing most of the needed entries on the Condor head node (i.e. no
entries for the worker nodes on the local IP network).  This meant our
cluster head node named service was being pummeled with the UDP packet
routing requests and was running at >50% CPU!

The /etc/hosts file was installed and the system rebooted and the named
service is much happier and Condor has not lost a VM yet!  However, I
still see the Condor Collector process running at close to 100% CPU, with 
only D_MATCH turned on.  Is this expected?

Thanks
Leslie


On Thu, 28 Apr 2005, Leslie Groer wrote:

> I am running 11 dedicated worker nodes (dual CPU, Scientific Linux 3.0.3,
> Condor 6.7.3) with 4 VMs, two separate schedulers and another scheduler on
> the CM node (dual 2.4 GHz Xeon, 2 GB RAM, 1GbE interface).  The
> condor_collector process always seems to be at about 77% CPU.
> 
>   PID   PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
> 10762    25   0  3832 3832  2120 R    76.3  0.1  63:58   2 condor_collecto
> 
> Is this normal?  I believe the CM may be dropping UDP packets and hence is 
> removing VMs from the system.  I upgraded to Condor 6.7.3 which is 
> supposed to help with this issue but I still see VMs being dropped.  
> I also doubled the ClassAD lifetime, and timeouts in the collector and 
> negotiator:
>  CLASSAD_LIFETIME       = 1800
>  CLIENT_TIMEOUT         = 60
>  NEGOTIATOR_TIMEOUT     = 60
> but I still see stale Ads being removed.
> 
> My concern is I am running only 5% of our worker nodes in the system so 
> far.  What happens when I scale up to 220 worker nodes?  Next step is to 
> go to TCP, but was wondering if there is some misconfiguration causing the 
> collector to be too busy.  Relevant debugging and other parameters are set 
> at:
>   ALL_DEBUG               = D_PROTOCOL D_MATCH 
>   COLLECTOR_CLASS_HISTORY_SIZE = 1024
>   COLLECTOR_DAEMON_HISTORY_SIZE = 128
>   COLLECTOR_DAEMON_STATS = True
>   COLLECTOR_DEBUG		= 
>   MAX_COLLECTOR_LOG	= 640000000
> 
> Thanks
> Leslie Groer
> 
> 

-- 
   ,-~~-.___.       ________________________________________________
  / |  '     \      groer@xxxxxxxxxxxxxxxxxxx  Department of Physics
 (  )        0           Tel: +1-416-978-2959  University of Toronto
  \_/-, ,----'           Fax: +1-416-978-8221  60 St. George Street
     ====           //                         Toronto, ON M5S 1A7
    /  \-'~;    /~~~(O)                        Canada
   /  __/~|   /       |  Office: McLennan Physics Lab Room 911
 =(  _____| (_________|  http://home.fnal.gov/~groer
     Leslie S. Groer