[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Problem with negotiator daemon



Hi Andrea,

It looks like maybe the bode crashed and the Accountant log was corrupted with the contents of a Java crash report.  You can probably hand edit it to remove the corruption.

Brian

Sent from my iPhone

> On Dec 19, 2015, at 7:30 AM, Andrea Sartirana <sartiran@xxxxxxxxxxxx> wrote:
> 
> Hi,
> 
> we are currently experiencing a weird problem at GRIF on our CREAM+HTCondor cluster.
> The Negotiator service refuses to start. We see the in the log file the messages below [1] ad then the daemon crashes.
> 
> The farm was draining and it is almost empty so I do not see what can be wrong...
> But I'm really not a condor expert.
> 
> Any hint?
> 
> Thanks in  advance.
> Cheers,
> Andrea
> 
> 
> 12/19/15 14:24:34 Using config source: /etc/condor/condor_config
> 12/19/15 14:24:34 Using local config sources:
> 12/19/15 14:24:34    /etc/condor/config.d/quattor.0.global.conf
> 12/19/15 14:24:34    /etc/condor/config.d/quattor.1.security.conf
> 12/19/15 14:24:34    /etc/condor/config.d/quattor.2.params.conf
> 12/19/15 14:24:34    /etc/condor/config.d/quattor.3.head.conf
> 12/19/15 14:24:34    /etc/condor/config.d/quattor.4.groups.conf
> 12/19/15 14:24:34    /etc/condor/condor_config.local
> 12/19/15 14:24:34 config Macros = 251, Sorted = 251, StringBytes = 13200, TablesBytes = 9124
> 12/19/15 14:24:34 CLASSAD_CACHING is ENABLED
> 12/19/15 14:24:34 Daemon Log is logging: D_ALWAYS D_ERROR D_MATCH
> 12/19/15 14:24:34 DaemonCore: command socket at <134.158.132.147:51957>
> 12/19/15 14:24:34 DaemonCore: private command socket at <134.158.132.147:51957>
> 12/19/15 14:24:34 WARNING: Encountered corrupt log record 198 (byte offset 14645)
> 12/19/15 14:24:34     999
> 12/19/15 14:24:34 Lines following corrupt log record 198 (up to 3):
> 12/19/15 14:24:34     103 Customer.group_# # There is insufficient memory for the Java Runtime Environment to continue_ # Cannot create GC thread_ Out of system resources_ # An error report file with more information is saved as: # /var/tmp/hs_err_pid2363_log.default.heslo098@grid AccumulatedUsage 0.0
> 12/19/15 14:24:34     103 Customer.group_# # There is insufficient memory for the Java Runtime Environment to continue_ # Cannot create GC thread_ Out of system resources_ # An error report file with more information is saved as: # /var/tmp/hs_err_pid2363_log.default.heslo098@grid MyType "*"
> 12/19/15 14:24:34     103 Customer.group_# # There is insufficient memory for the Java Runtime Environment to continue_ # Cannot create GC thread_ Out of system resources_ # An error report file with more information is saved as: # /var/tmp/hs_err_pid2363_log.default.heslo098@grid WeightedUnchargedTime 0.0
> 12/19/15 14:24:34 ERROR "Error: corrupt log record 198 (byte offset 14645) occurred inside closed transaction, recovery failed" at line 1293 in file /slots/02/dir_42284/userdir/src/condor_utils/classad_log.cpp
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/