[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor_collector crashing in HTCondor-CE



Hi Brian,

I've updated the condor version to 8.6.5 and restart the condor and condor-ce daemons. The condor_collector daemon is stable and not crashing, everything seems to work fine. After the update, I've restarted condor-ce daemons several times and all of them are running fine. So, it looks like that the issue is solved.

Thank you very much!

Cheers,

Carles

On 08/02/2017 10:48 PM, Brian Bockelman wrote:
Hi Carles,

This issue really, really looks like an IPv6-related bug that was fixed in 8.6.5 (released 1 minute ago) that was triggered when the host unexpectedly gets an IPv6 address.

Would it be possible to try out the new version?

Brian

On Jul 26, 2017, at 8:15 AM, Carles Acosta <cacosta@xxxxxx> wrote:

Dear all,

I have news about our issue with the condor_collector. We updated HTCondor to the stable version 8.6.4 in our CE. Unfortunately, the error persisted at the beginning: the condor_collector was crashing again after a restart with CAs 1.8.4 and we had to kill all running jobs to ensure that the condor_collector daemon was running again without crashes.

I've been checking Yutaro Iiyama issue and it seems quite similar but affecting to condor_schedd daemon (our error is attached). We are running a dual-stack pool with just one only-IPv6 WN, so, the general configuration of our condor pool is:

ENABLE_IPV4 = auto
ENABLE_IPV6 = auto
PREFER_IPV4 = true

I've added the same lines in the condor-ce configuration. I've tried few condor-ce restart and now it seems stable, but I'm seeing several messages like the ones before the crash:

Failed to send DC_INVALIDATE_KEY to daemon at <IPV4:3196>: SECMAN:2003:TCP connection to daemon at <IPV4:31986> failed.

DC_AUTHENTICATE: attempt to open invalid session ce13:1272309:1501056815:3, failing; this session was requested by <IPV4:27682> with return address <IPV4?addrs=IPV4-21917+[IPV6]-21917&alias=name>

I will report any other problem related to this issue in the future.

Thank you very much.

Cheers,

Carles

-- 
Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 22
Fax: +34 93 581 41 10
http://www.pic.es 
Avís - Aviso - Legal Notice: http://www.ifae.es/legal.html
<crash.txt>_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


-- 
Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 22
Fax: +34 93 581 41 10
http://www.pic.es 
Avís - Aviso - Legal Notice: http://www.ifae.es/legal.html