[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor_collector crashing in HTCondor-CE



On 8/3/2017 3:47 AM, Carles Acosta wrote:
Hi Brian,

I've updated the condor version to 8.6.5 and restart the condor and condor-ce daemons. The condor_collector daemon is stable and not crashing, everything seems to work fine. After the update, I've restarted condor-ce daemons several times and all of them are running fine. So, it looks like that the issue is solved.

Thank you very much!

Cheers,

Carles


Glad to hear the upgrade to HTCondor v8.6.5 solved your IPv6 issue!

regards,
Todd






On 08/02/2017 10:48 PM, Brian Bockelman wrote:
Hi Carles,

This issue really, really looks like an IPv6-related bug that was fixed in 8.6.5 (released 1 minute ago) that was triggered when the host unexpectedly gets an IPv6 address.

Would it be possible to try out the new version?

Brian

On Jul 26, 2017, at 8:15 AM, Carles Acosta <cacosta@xxxxxx <mailto:cacosta@xxxxxx>> wrote:

Dear all,

I have news about our issue with the condor_collector. We updated HTCondor to the stable version 8.6.4 in our CE. Unfortunately, the error persisted at the beginning: the condor_collector was crashing again after a restart with CAs 1.8.4 and we had to kill all running jobs to ensure that the condor_collector daemon was running again without crashes.

I've been checking Yutaro Iiyama issue and it seems quite similar but affecting to condor_schedd daemon (our error is attached). We are running a dual-stack pool with just one only-IPv6 WN, so, the general configuration of our condor pool is:

ENABLE_IPV4 = auto
ENABLE_IPV6 = auto
PREFER_IPV4 = true

I've added the same lines in the condor-ce configuration. I've tried few condor-ce restart and now it seems stable, but I'm seeing several messages like the ones before the crash:

Failed to send DC_INVALIDATE_KEY to daemon at <IPV4:3196>: SECMAN:2003:TCP connection to daemon at < <http://188.184.82.78:319/>IPV4 <http://188.184.82.78:319/>:31986> failed. <http://188.184.82.78:319/>

DC_AUTHENTICATE: attempt to open invalid session ce13:1272309:1501056815:3, failing; this session was requested by <IPV4:27682> with return address <IPV4?addrs=IPV4-21917+[IPV6]-21917&alias=name>

I will report any other problem related to this issue in the future.

Thank you very much.

Cheers,

Carles

--
Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel:+34 93 581 33 22 <tel:+34%20935%2081%2033%2022>
Fax:+34 93 581 41 10 <tel:+34%20935%2081%2041%2010>
http://www.pic.es <http://www.pic.es/> Avís - Aviso - Legal Notice:http://www.ifae.es/legal.html
<crash.txt>_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx <mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message tohtcondor-users-request@xxxxxxxxxxx  with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 22
Fax: +34 93 581 41 10
http://www.pic.es Avís - Aviso - Legal Notice:http://www.ifae.es/legal.html



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685