Is it possible to get this looked at/changed? i.e. the storage max size referring to individual
file sizes rather than the folder/directory size?
We’re now at the stage of having to set each of our 5 central managers to keep viewhist data
and querying them independently, and collating the data, rather than just using our 1 condorview server
that they all report to. Even then, one of the largest pools (4400 slots) still loses data < 1 month old.
Other pools (2380, 1650, 1440, 1200 slots) are OK for the moment, in terms of retaining viewhist data
for at least 1 month before rolling over. Obviously we could split the largest pool into 2 but that seems
a bit too kludgy, and would create deployment problems/issues for us.
It’s probably worth noting that we have exacerbated the issue as we have changed from the
default time for the startd’s to send updates to the collector of 300 secs (5 mins) to 30 secs
(in fact we’ve probably changed all 300 sec intervals in the config files to 30 secs).
We have the size currently set to 2,000,000,000 (2Gb) but larger values give errors due to
integer overflow (the default size is 10,000,000 (10Mb)).
In earlier condor versions the viewhist files used to get much bigger than that (I’ve seen one grow to 2 GB). But now Greg is right, condor 7.6 and greater does limit the size of all of the viewhist, even though some of them never grow to the full size.
It seems that someone has changed the definition of POOL_HISTORY_MAX_STORAGE—it used to be size of kilobytes
And now it has gone to being bytes as with most other condor variables.
I have 6000 cores in my pool and my condor_stats still goes back a full year, with pool_history_max_storage currently
Set to 500,000,000. I would think that Greg should be able to boost the value further and get more data.
If I had actually read the release notes correctly, would I have seen these changes mentioned?