[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] schedd keeps dying



UPDATE_COLLECTOR_WITH_TCP, HIGHPORT and LOWPORT are not set.  As I
understand it, the default it to use TCP, and an unlimited number of
ports, is that right?  I grepped ShadowLog for 'port' ( case insensitive
), and didn't see anything. In case it was a memory issue, I assigned
the KVM more RAM and rebooted. Schedd died once on startup, but it
staying up since then.

On 12/2/11 2:55 PM, Steven Timm wrote:
> Sarah, do you use the HIGHPORT and LOWPORT settings
> in the condor config file?
> You need 2 ports available for every condor_shadow process
> that is running.  Also are you using udp or tcp to update
> the collector?  The error in question is consistent
> with running out of ports.  ShadowLog will tell you something
> about that too if your debug level is high enough.
> 
> Steve Timm
> 
> 
> 
> 
> On Fri, 2 Dec 2011, Sarah Williams wrote:
> 
>> Hi Condor users and experts,
>>
>> I'm seeing my condor_schedd die repeatedly with the stack trace below.
>> I've put the core file up at:
>> http://www.mwt2.org/~sarah/core
>> The installation was stable previously at ~1000 cores, but because
>> unstable when increased to 4000 cores.
>>
>> --Sarah
>>
>> Stack dump for process 24238 at timestamp 1322852851 (22 frames)
>> condor_schedd(dprintf_dump_stack+0x56)[0x66f2f6]
>> condor_schedd(_Z18linux_sig_coredumpi+0x4d)[0x59be2d]
>> /lib64/libpthread.so.0[0x339180eb10]
>> /lib64/libc.so.6(abort+0x28f)[0x3391031e8f]
>> /usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x114)[0x3095abecb4]
>>
>> /usr/lib64/libstdc++.so.6[0x3095abcdb6]/usr/lib64/libstdc++.so.6[0x3095abcde3]
>>
>> /usr/lib64/libstdc++.so.6[0x3095abceca]
>> /usr/lib64/libstdc++.so.6(_Znwm+0x79)[0x3095abd1d9]
>> condor_schedd(_ZN13_condorOutMsgC1Ev+0x1b)[0x5ef02b]
>> condor_schedd(_ZN8SafeSockC1Ev+0x36)[0x5e3186]
>> condor_schedd(_ZN10DaemonCore14Create_ProcessEPKcRK7ArgList10priv_stateiiPK3EnvS1_P10FamilyInfoPP6StreamPiSE
>>
>> _iP10__sigset_tiPmSE_S1_P8MyString+0x18a)[0x59372a]
>> condor_schedd(_ZN9Scheduler18spawnJobHandlerRawEP10shadow_recPKcRK7ArgListPK3EnvS3_bbb+0x225)[0x5275b5]
>>
>> condor_schedd(_ZN9Scheduler11spawnShadowEP10shadow_rec+0x2e4)[0x5392f4]
>> condor_schedd(_ZN9Scheduler15spawnJobHandlerEiiP10shadow_rec+0xa0)[0x5398c0]
>>
>> condor_schedd(_Z26aboutToSpawnJobHandlerDoneiiPvi+0xe2)[0x539ae2]
>> condor_schedd(_ZN9Scheduler15StartJobHandlerEv+0x13e)[0x53a15e]
>> condor_schedd(_ZN12TimerManager7TimeoutEv+0x155)[0x5a18f5]
>> condor_schedd(_ZN10DaemonCore6DriverEv+0x248)[0x58ed78]
>> condor_schedd(main+0xe47)[0x59e5a7]
>> /lib64/libc.so.6(__libc_start_main+0xf4)[0x339101d994]
>> condor_schedd(__gxx_personality_v0+0x411)[0x4fa779]
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
>