[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Problem with negotiator daemon



On 19/12/2015 15:10, Brian Bockelman wrote:
Hi Andrea,

It looks like maybe the bode crashed and the Accountant log was corrupted with the contents of a Java crash report.  You can probably hand edit it to remove the corruption.
Yep. That seems to be indeed the problem. Now things are working fine.

Thanks!
a.

Brian

Sent from my iPhone

On Dec 19, 2015, at 7:30 AM, Andrea Sartirana <sartiran@xxxxxxxxxxxx> wrote:

Hi,

we are currently experiencing a weird problem at GRIF on our CREAM+HTCondor cluster.
The Negotiator service refuses to start. We see the in the log file the messages below [1] ad then the daemon crashes.

The farm was draining and it is almost empty so I do not see what can be wrong...
But I'm really not a condor expert.

Any hint?

Thanks in  advance.
Cheers,
Andrea


12/19/15 14:24:34 Using config source: /etc/condor/condor_config
12/19/15 14:24:34 Using local config sources:
12/19/15 14:24:34    /etc/condor/config.d/quattor.0.global.conf
12/19/15 14:24:34    /etc/condor/config.d/quattor.1.security.conf
12/19/15 14:24:34    /etc/condor/config.d/quattor.2.params.conf
12/19/15 14:24:34    /etc/condor/config.d/quattor.3.head.conf
12/19/15 14:24:34    /etc/condor/config.d/quattor.4.groups.conf
12/19/15 14:24:34    /etc/condor/condor_config.local
12/19/15 14:24:34 config Macros = 251, Sorted = 251, StringBytes = 13200, TablesBytes = 9124
12/19/15 14:24:34 CLASSAD_CACHING is ENABLED
12/19/15 14:24:34 Daemon Log is logging: D_ALWAYS D_ERROR D_MATCH
12/19/15 14:24:34 DaemonCore: command socket at <134.158.132.147:51957>
12/19/15 14:24:34 DaemonCore: private command socket at <134.158.132.147:51957>
12/19/15 14:24:34 WARNING: Encountered corrupt log record 198 (byte offset 14645)
12/19/15 14:24:34     999
12/19/15 14:24:34 Lines following corrupt log record 198 (up to 3):
12/19/15 14:24:34     103 Customer.group_# # There is insufficient memory for the Java Runtime Environment to continue_ # Cannot create GC thread_ Out of system resources_ # An error report file with more information is saved as: # /var/tmp/hs_err_pid2363_log.default.heslo098@grid AccumulatedUsage 0.0
12/19/15 14:24:34     103 Customer.group_# # There is insufficient memory for the Java Runtime Environment to continue_ # Cannot create GC thread_ Out of system resources_ # An error report file with more information is saved as: # /var/tmp/hs_err_pid2363_log.default.heslo098@grid MyType "*"
12/19/15 14:24:34     103 Customer.group_# # There is insufficient memory for the Java Runtime Environment to continue_ # Cannot create GC thread_ Out of system resources_ # An error report file with more information is saved as: # /var/tmp/hs_err_pid2363_log.default.heslo098@grid WeightedUnchargedTime 0.0
12/19/15 14:24:34 ERROR "Error: corrupt log record 198 (byte offset 14645) occurred inside closed transaction, recovery failed" at line 1293 in file /slots/02/dir_42284/userdir/src/condor_utils/classad_log.cpp


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/