[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] POOL_HISTORY_MAX_STORAGE



Bump.

 

Is it possible to get this looked at/changed? i.e. the storage max size referring to individual

file sizes rather than the folder/directory size?

 

We’re now at the stage of having to set each of our 5 central managers to keep viewhist data

and querying them independently, and collating the data, rather than just using our 1 condorview server

that they all report to. Even then, one of the largest pools (4400 slots) still loses data < 1 month old.

Other pools (2380, 1650, 1440, 1200 slots) are OK for the moment, in terms of retaining viewhist data

for at least 1 month before rolling over. Obviously we could split the largest pool into 2 but that seems

a bit too kludgy, and would create deployment problems/issues for us.

 

Thanks

 

Cheers

 

Greg

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Greg.Hitchen@xxxxxxxx
Sent: Wednesday, 4 July 2012 11:30 AM
To: condor-users@xxxxxxxxxxx
Subject: [ExternalEmail] Re: [Condor-users] POOL_HISTORY_MAX_STORAGE

 

Hi Steve

 

It’s probably worth noting that we have exacerbated the issue as we have changed from the

default time for the startd’s to send updates to the collector of 300 secs (5 mins)  to 30 secs

(in fact we’ve probably changed all 300 sec intervals in the config files to 30 secs).

 

We have the size currently set to 2,000,000,000 (2Gb) but larger values give errors due to

integer overflow (the default size is 10,000,000 (10Mb)).

 

Cheers

 

Greg

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Steven C Timm
Sent: Wednesday, 4 July 2012 10:35 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] POOL_HISTORY_MAX_STORAGE

 

Hi—

In earlier condor versions the viewhist files used to get much bigger than that (I’ve seen one grow to 2 GB).  But now Greg is right, condor 7.6 and greater does limit the size of all of the viewhist, even though some of them never grow to the full size.

It seems that someone has changed the definition of POOL_HISTORY_MAX_STORAGE—it used to be size of kilobytes

And now it has gone to being bytes as with most other condor variables.

 

I have 6000 cores in my pool and my condor_stats still goes back a full year, with pool_history_max_storage currently

Set to 500,000,000.  I would think that Greg should be able to boost the value further and get more data.

 

If I had actually read the release notes correctly, would I have seen these changes mentioned?

 

Steve

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Greg.Hitchen@xxxxxxxx
Sent: Tuesday, July 03, 2012 9:10 PM
To: condor-users@xxxxxxxxxxx
Subject: Re: [Condor-users] POOL_HISTORY_MAX_STORAGE

 

Any chance this can get looked at so that we can store stats for > 1 month?

 

It appears that the POOL_HISTORY_MAX_STORAGE applies to the whole

directory, and makes the assumption that there will be 27 viewhist* files

so therefore assumes that if all files reach max size then any one file

can’t be > 66.7 Mb in size?

 

Thanks

 

Cheers

 

Greg

 

P.S. the silence was deafening from my previous post J (below) so should I

be sending this to condor-admin instead?

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Greg.Hitchen@xxxxxxxx
Sent: Wednesday, 4 April 2012 4:09 PM
To: condor-users@xxxxxxxxxxx
Subject: [ExternalEmail] [Condor-users] POOL_HISTORY_MAX_STORAGE

 

We keep pool history info in our Condor setup on our Condor ViewServer machine.

This is a standalone machine that collects info from 5 separate Central Managers.

 

For historical reasons we had forced condor to represent each machine as one

“resource”, i.e. NUM_CPUS=1

 

We have recently enabled core detection and now have a total of ~ 10,000 cores

in across all 5 pools. In recently using condor_stats to produce some monthly

stats it’s become obvious that information is being lost, i.e. it appears we don’t

have info going back far enough (> 1 month).

 

Just increasing the POOL_HISTORY_MAX_STORAGE doesn’t work (currently set

at 2000,000,000 = 2 Gb, increased to 20000000000 = 20Gb) as we get the following

error message in CollectorLog.

 

04/04/12 13:21:05 ERROR "POOL_HISTORY_MAX_STORAGE in the condor configuration is

out of bounds for an integer (20000000000).  Please set it to an integer in the

range -2147483648 to 2147483647 (default 10000000)." at line 1693 in file /home

/condor/execute/dir_30458/userdir/src/condor_utils/condor_config.cpp

 

From what I can see our viewhistory directory is only 737Mb in size (see below).

There does seem to be some forced file rotation though at individual file

sizes of ~66.7Mb. Can anyone confirm what’s meant to happen with these

storage limits and file sizes and rotations?

 

Thanks

 

Cheers

 

Greg

 

>ll

total 736780

-rw-r--r-- 1 condor condor 31434648 Apr  4 15:49 viewhist0.0.new

-rw-r--r-- 1 condor condor 66666695 Aug  9  2011 viewhist0.0.old

-rw-r--r-- 1 condor condor 41337118 Apr  4 15:37 viewhist0.1.new

-rw-r--r-- 1 condor condor   333359 Apr  6  2006 viewhist0.1.old

-rw-r--r-- 1 condor condor 10537682 Apr  4 15:21 viewhist0.2.new

-rw-r--r-- 1 condor condor   333380 Jan 19  2006 viewhist0.2.old

-rw-r--r-- 1 condor condor 42825820 Apr  4 15:49 viewhist1.0.new

-rw-r--r-- 1 condor condor 67153008 Apr  4 09:36 viewhist1.0.old

-rw-r--r-- 1 condor condor 27489661 Apr  4 15:37 viewhist1.1.new

-rw-r--r-- 1 condor condor 66981236 Apr  3 23:24 viewhist1.1.old

-rw-r--r-- 1 condor condor 35099274 Apr  4 15:21 viewhist1.2.new

-rw-r--r-- 1 condor condor 66884552 Mar 31 17:43 viewhist1.2.old

-rw-r--r-- 1 condor condor  1208195 Apr  4 15:49 viewhist2.0.new

-rw-r--r-- 1 condor condor 66666869 Mar 27 18:51 viewhist2.0.old

-rw-r--r-- 1 condor condor  1889444 Apr  4 15:37 viewhist2.1.new

-rw-r--r-- 1 condor condor 66666889 Feb 15 05:28 viewhist2.1.old

-rw-r--r-- 1 condor condor 17508889 Apr  4 15:21 viewhist2.2.new

-rw-r--r-- 1 condor condor   333437 Mar  7  2006 viewhist2.2.old

-rw-r--r-- 1 condor condor 41038505 Apr  4 15:49 viewhist3.0.new

-rw-r--r-- 1 condor condor 66666970 Nov  4  2010 viewhist3.0.old

-rw-r--r-- 1 condor condor 27010376 Apr  4 15:37 viewhist3.1.new

-rw-r--r-- 1 condor condor   333372 Mar 15  2006 viewhist3.1.old

-rw-r--r-- 1 condor condor  6818397 Apr  4 15:21 viewhist3.2.new

-rw-r--r-- 1 condor condor   333371 Mar 13  2006 viewhist3.2.old

-rw-r--r-- 1 condor condor        0 Sep 12  2005 viewhist4.0.new

-rw-r--r-- 1 condor condor        0 Sep 12  2005 viewhist4.1.new

-rw-r--r-- 1 condor condor        0 Sep 12  2005 viewhist4.2.new