[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] /var/ViewHist directory default for pool history



Hi all,

I just stumbled over an issue with a fresh spawned test cluster, where
the collector died regularly with a status 4.
As it seems the issue was a '/var/ViewHist' directory, which was
missing. After creating it (and also owning to the condor user...), the
collector seems to be stable.
I guess this directory is the pool history dictionary (viewhist files
look like they keep some kind of pool statistics)
While
  KEEP_POOL_HISTORY
is enabled,
  POOL_HISTORY_DIR
is not set- from the documentation I would take that it should  go to
/var/spool/ by default, or?
Long story short: is it maybe a bug, that the pool history default
starts at /var but not /var/spool or have I screwed up my config? ;)

Cheers,
  Thomas






[MasterLog]
12/12/18 04:58:07 attempt to connect to <131.169.168.39:9618> failed:
Connection refused (connect errno = 111).
12/12/18 04:58:07 ERROR: SECMAN:2003:TCP connection to collector
dcache-dot1.desy.de:9618 failed.
12/12/18 04:58:07 Failed to start non-blocking update to
<131.169.168.39:9618>.
12/12/18 04:58:18 Started DaemonCore process
"/usr/sbin/condor_collector", pid and pgroup = 114569
12/12/18 05:06:18 DefaultReaper unexpectedly called on pid 114569,
status 1024.
12/12/18 05:06:18 The COLLECTOR (pid 114569) exited with status 4
12/12/18 05:06:18 Sending obituary for "/usr/sbin/condor_collector"
12/12/18 05:06:18 restarting /usr/sbin/condor_collector in 10 seconds
12/12/18 05:06:18 condor_write(): Socket closed when trying to write
1904 bytes to collector dcache-dot1.desy.de:9618, fd is 12
12/12/18 05:06:18 Buf::write(): condor_write() failed
12/12/18 05:06:18 attempt to connect to <131.169.168.39:9618> failed:
Connection refused (connect errno = 111).

[CollectorLog]
12/12/18 04:57:51 StartdAd     : Inserting ** "<
slot1@xxxxxxxxxxxxxxxxxxxx , 131.169.98.92 >"
12/12/18 04:57:51 StartdPvtAd  : Inserting ** "<
slot1@xxxxxxxxxxxxxxxxxxxx , 131.169.98.92 >"
12/12/18 04:58:07 Accumulating data: Time=1544587087
12/12/18 04:58:07 Could not open data file /var/ViewHist/viewhist0.0.new
for appending!!! errno=13
12/12/18 04:58:07 ERROR "Could not open data file appending!!!" at line
739 in file
/slots/11/dir_3021763/userdir/.tmpyoMELi/BUILD/condor-8.7.10/src/condor_collector.V6/view_server.cpp
12/12/18 04:58:18 Setting maximum file descriptors to 10240.
12/12/18 04:58:18 ******************************************************
12/12/18 04:58:18 ** condor_collector (CONDOR_COLLECTOR) STARTING UP
12/12/18 04:58:18 ** /usr/sbin/condor_collector


[ProcLog]
12/12/18 04:58:07 : PROC_FAMILY_KILL_FAMILY
12/12/18 04:58:07 : taking a snapshot...
12/12/18 04:58:07 : process 113437 (of family 113437) has exited
12/12/18 04:58:07 : ...snapshot complete
12/12/18 04:58:07 : sending signal 9 to family with root 113437
12/12/18 04:58:07 : PROC_FAMILY_UNREGISTER_FAMILY
12/12/18 04:58:07 : unregistering family with root pid 113437

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature