Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] collector memory leak / history truncation
- Date: Tue, 1 Jul 2008 09:41:03 -0400
- From: "Douglas Thain" <dthain@xxxxxx>
- Subject: Re: [Condor-users] collector memory leak / history truncation
All -
Thanks for all the folllow ups, I am working with the Condor team to
debug this, and we will post what we learn...
Doug
On 7/1/08, ucarlino@xxxxxxxxxx <ucarlino@xxxxxxxxxx> wrote:
> Master host is RadHat Linux RHEL 3.
>
> Total machines:
>
> 101:condor@lnxgen7/home/condor> condor_status -t
>
> Total Owner Claimed Unclaimed Matched Preempting
> Backfill
>
> INTEL/LINUX 204 24 88 92 0 0
> 0
> INTEL/WINNT51 2347 204 477 1658 8 0
> 0
> INTEL/WINNT52 12 5 0 7 0 0
> 0
> SUN4u/SOLARIS28 15 11 0 4 0 0
> 0
> SUN4u/SOLARIS29 6 2 0 4 0 0
> 0
>
> Total 2584 246 565 1765 8 0
> 0
>
> 105:condor@lnxgen7/home/condor> ps auxw | grep condor_
> condor 3606 0.1 0.4 21084 17332 ? S Mar12 166:59
> /home/condor/6.8.6/Linux-2.4-i386/sbin/condor_master
> condor 3674 0.0 0.1 7992 3912 ? S Mar12 19:16 condor_startd
> -f
> condor 3675 0.0 0.0 8168 3572 ? S Mar12 1:36 condor_schedd
> -f
> condor 3676 43.8 3.6 143232 139488 ? S Mar12 69754:49
> condor_negotiator -f
> condor 25070 16.0 2.5 102648 98528 ? S Jun30 181:47
> condor_collector -f
> condor 2537 0.0 0.0 1616 472 pts/1 S 01:52 0:00 grep condor_
>
> 109:condor@lnxgen7/home/condor> top -bn1 | grep condor_
> 3676 condor 25 0 136M 136M 2512 R 22.5 3.6 69755m 2
> condor_negotiat
> 25070 condor 15 0 98560 96M 2460 S 3.1 2.5 182:11 0
> condor_collecto
> 3606 condor 15 0 17332 16M 2596 S 0.0 0.4 166:59 0
> condor_master
> 3674 condor 15 0 3912 3912 2840 S 0.0 0.1 19:16 0
> condor_startd
> 3675 condor 15 0 3572 3572 2792 S 0.0 0.0 1:36 3
> condor_schedd
>
> Collector is restarted by cron twice a week:
> 110:condor@lnxgen7/home/condor> crontab -l
> # Cron entries for Micron 'is' Condor pool
> # Activate on the pool controller using: crontab
> /home/condor/cron/collector_crontab
> #
> # Restart the Collector at 7:00 AM on Mondays and Thursdays
>
> 0 7 * * mon,thu /home/condor/6.8.6/Linux-2.4-i386/sbin/condor_restart
> -subsystem Collector >/tmp/collector_restart.log 2>&1
>
> Regards,
> Umberto
>
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Steffen Grunewald
> Sent: 01 July 2008 09:47
> To: condor-users@xxxxxxxxxxx
> Subject: Re: [Condor-users] collector memory leak / history truncation
>
> On Tue, Jul 01, 2008 at 09:37:11AM +0200, ucarlino@xxxxxxxxxx wrote:
>> We've been experiencing the same problem for long time now.
>> With 6.8.6, 7.0.1, 7.0.2, 7.1.0 and 7.0.3. They all have the same
>> problem.
>> And I agree with the fact that it seems proportional with the number
>> of machine in the pool.
>
> How many are there? We're running a 600+ node cluster, and after more than 1
> million hours of accumulated usage:
>
> # ps auxw | grep condor
> condor 491 0.0 0.1 17020 3260 ? Ss Jun02 28:43
> /usr/sbin/condor_master
> condor 492 14.2 2.8 71480 57960 ? Ss Jun02 5930:59
> condor_collector -f
> condor 495 1.7 3.0 74520 61300 ? Ss Jun02 730:49
> condor_negotiator -f
> condor 496 0.0 0.1 18152 3916 ? Ss Jun02 0:24
> condor_schedd -f
> root 497 0.0 0.1 11812 2492 ? S Jun02 9:03
> condor_procd -A /usr/share/condor/local/log/procd_pipe.SCHEDD -C 666
> root 9903 0.0 0.0 2748 604 pts/1 R+ 09:44 0:00 grep condor
> # top -bn1 | grep condor
> 492 condor 20 0 71480 56m 2564 R 34 2.8 5931:00 condor_collecto
> 491 condor 20 0 17020 3260 2168 S 0 0.2 28:43.55 condor_master
> 495 condor 20 0 74520 59m 2764 S 0 3.0 730:49.44 condor_negotiat
> 496 condor 20 0 18152 3916 3144 S 0 0.2 0:24.44 condor_schedd
> 497 root 20 0 11812 2492 1064 S 0 0.1 9:03.47 condor_procd
>
> Extra classAd attributes? (we don't have any...)
>
> Steffen
>
> --
> Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
> Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
> * e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
> No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>