[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] schedd keeps dying



Sarah, do you use the HIGHPORT and LOWPORT settings
in the condor config file?
You need 2 ports available for every condor_shadow process
that is running.  Also are you using udp or tcp to update
the collector?  The error in question is consistent
with running out of ports.  ShadowLog will tell you something
about that too if your debug level is high enough.

Steve Timm




On Fri, 2 Dec 2011, Sarah Williams wrote:

Hi Condor users and experts,

I'm seeing my condor_schedd die repeatedly with the stack trace below.
I've put the core file up at:
http://www.mwt2.org/~sarah/core
The installation was stable previously at ~1000 cores, but because
unstable when increased to 4000 cores.

--Sarah

Stack dump for process 24238 at timestamp 1322852851 (22 frames)
condor_schedd(dprintf_dump_stack+0x56)[0x66f2f6]
condor_schedd(_Z18linux_sig_coredumpi+0x4d)[0x59be2d]
/lib64/libpthread.so.0[0x339180eb10]
/lib64/libc.so.6(abort+0x28f)[0x3391031e8f]
/usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x114)[0x3095abecb4]
/usr/lib64/libstdc++.so.6[0x3095abcdb6]/usr/lib64/libstdc++.so.6[0x3095abcde3]
/usr/lib64/libstdc++.so.6[0x3095abceca]
/usr/lib64/libstdc++.so.6(_Znwm+0x79)[0x3095abd1d9]
condor_schedd(_ZN13_condorOutMsgC1Ev+0x1b)[0x5ef02b]
condor_schedd(_ZN8SafeSockC1Ev+0x36)[0x5e3186]
condor_schedd(_ZN10DaemonCore14Create_ProcessEPKcRK7ArgList10priv_stateiiPK3EnvS1_P10FamilyInfoPP6StreamPiSE
_iP10__sigset_tiPmSE_S1_P8MyString+0x18a)[0x59372a]
condor_schedd(_ZN9Scheduler18spawnJobHandlerRawEP10shadow_recPKcRK7ArgListPK3EnvS3_bbb+0x225)[0x5275b5]
condor_schedd(_ZN9Scheduler11spawnShadowEP10shadow_rec+0x2e4)[0x5392f4]
condor_schedd(_ZN9Scheduler15spawnJobHandlerEiiP10shadow_rec+0xa0)[0x5398c0]
condor_schedd(_Z26aboutToSpawnJobHandlerDoneiiPvi+0xe2)[0x539ae2]
condor_schedd(_ZN9Scheduler15StartJobHandlerEv+0x13e)[0x53a15e]
condor_schedd(_ZN12TimerManager7TimeoutEv+0x155)[0x5a18f5]
condor_schedd(_ZN10DaemonCore6DriverEv+0x248)[0x58ed78]
condor_schedd(main+0xe47)[0x59e5a7]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x339101d994]
condor_schedd(__gxx_personality_v0+0x411)[0x4fa779]
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Group Leader.
Lead of FermiCloud project.