[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] collector memory leak / history truncation



Master host is RadHat Linux RHEL 3.

Total machines:

101:condor@lnxgen7/home/condor> condor_status -t

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

         INTEL/LINUX   204    24      88        92       0          0        0
       INTEL/WINNT51  2347   204     477      1658       8          0        0
       INTEL/WINNT52    12     5       0         7       0          0        0
     SUN4u/SOLARIS28    15    11       0         4       0          0        0
     SUN4u/SOLARIS29     6     2       0         4       0          0        0

               Total  2584   246     565      1765       8          0        0 

105:condor@lnxgen7/home/condor> ps auxw | grep condor_
condor    3606  0.1  0.4 21084 17332 ?       S    Mar12 166:59 /home/condor/6.8.6/Linux-2.4-i386/sbin/condor_master
condor    3674  0.0  0.1  7992 3912 ?        S    Mar12  19:16 condor_startd -f
condor    3675  0.0  0.0  8168 3572 ?        S    Mar12   1:36 condor_schedd -f
condor    3676 43.8  3.6 143232 139488 ?     S    Mar12 69754:49 condor_negotiator -f
condor   25070 16.0  2.5 102648 98528 ?      S    Jun30 181:47 condor_collector -f
condor    2537  0.0  0.0  1616  472 pts/1    S    01:52   0:00 grep condor_

109:condor@lnxgen7/home/condor> top -bn1 | grep condor_
 3676 condor    25   0  136M 136M  2512 R    22.5  3.6 69755m   2 condor_negotiat
25070 condor    15   0 98560  96M  2460 S     3.1  2.5 182:11   0 condor_collecto
 3606 condor    15   0 17332  16M  2596 S     0.0  0.4 166:59   0 condor_master
 3674 condor    15   0  3912 3912  2840 S     0.0  0.1  19:16   0 condor_startd
 3675 condor    15   0  3572 3572  2792 S     0.0  0.0   1:36   3 condor_schedd

Collector is restarted by cron twice a week:
110:condor@lnxgen7/home/condor> crontab -l
# Cron entries for Micron 'is' Condor pool
# Activate on the pool controller using: crontab /home/condor/cron/collector_crontab
#
# Restart the Collector at 7:00 AM on Mondays and Thursdays

0 7 * * mon,thu /home/condor/6.8.6/Linux-2.4-i386/sbin/condor_restart -subsystem Collector >/tmp/collector_restart.log 2>&1

Regards,
	Umberto

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Steffen Grunewald
Sent: 01 July 2008 09:47
To: condor-users@xxxxxxxxxxx
Subject: Re: [Condor-users] collector memory leak / history truncation

On Tue, Jul 01, 2008 at 09:37:11AM +0200, ucarlino@xxxxxxxxxx wrote:
> We've been experiencing the same problem  for long time now. 
> With 6.8.6, 7.0.1, 7.0.2, 7.1.0 and 7.0.3. They all have the same 
> problem.
> And I agree with the fact that it seems proportional with the number 
> of machine in the pool.

How many are there? We're running a 600+ node cluster, and after more than 1 million hours of accumulated usage:

# ps auxw | grep condor
condor     491  0.0  0.1  17020  3260 ?        Ss   Jun02  28:43 /usr/sbin/condor_master
condor     492 14.2  2.8  71480 57960 ?        Ss   Jun02 5930:59 condor_collector -f
condor     495  1.7  3.0  74520 61300 ?        Ss   Jun02 730:49 condor_negotiator -f
condor     496  0.0  0.1  18152  3916 ?        Ss   Jun02   0:24 condor_schedd -f
root       497  0.0  0.1  11812  2492 ?        S    Jun02   9:03 condor_procd -A /usr/share/condor/local/log/procd_pipe.SCHEDD -C 666
root      9903  0.0  0.0   2748   604 pts/1    R+   09:44   0:00 grep condor
# top -bn1 | grep condor
  492 condor    20   0 71480  56m 2564 R   34  2.8   5931:00 condor_collecto
  491 condor    20   0 17020 3260 2168 S    0  0.2  28:43.55 condor_master
  495 condor    20   0 74520  59m 2764 S    0  3.0 730:49.44 condor_negotiat
  496 condor    20   0 18152 3916 3144 S    0  0.2   0:24.44 condor_schedd
  497 root      20   0 11812 2492 1064 S    0  0.1   9:03.47 condor_procd

Extra classAd attributes? (we don't have any...)

Steffen

--
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298} No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/