[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] schedd dies (with stack dump)



This morning I had asked what I could do to find the source of
jobs being set to idle in the middle of a (successful up to now)
run.
The condition has shown up again, and this time I was not too late
to look into the ScheddLog (the MasterLog would only tell me the
schedd had died):

14-09-25_12:20:02 (pid:15305) Inserting new attribute Scheduler into non-active cluster cid=780 acid=-1
14-09-25_12:26:30 (pid:15305) Found 1435 potential dedicated resources in 388 seconds
Stack dump for process 15305 at timestamp 1411640790 (18 frames)
/usr/lib/condor/libcondor_utils_8_2_2.so(dprintf_dump_stack+0x6d)[0x7fa830a0874d]
/usr/lib/condor/libcondor_utils_8_2_2.so(_Z18linux_sig_coredumpi+0x2a)[0x7fa830ad292a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf030)[0x7fa82c461030]
/usr/lib/libclassad.so.7(_ZNSt10_HashtableISsSt4pairIKSsSt8weak_ptrIN7classad10CacheEntryEEESaIS6_ESt10_Select1stIS6_ESt8equal_toISsESt4hashISsENSt8__detail18_Mod_range_hashingENSE_20_Default_ranged_hashENSE_20_Prime_rehash_policyELb1ELb0ELb1EE4findERS1_+0x14)[0x7fa8305ebcd4]
/usr/lib/libclassad.so.7(_ZN7classad10CacheEntryD1Ev+0x6e)[0x7fa8305ea36e]
/usr/lib/libclassad.so.7(_ZN7classad10CacheEntryD0Ev+0x9)[0x7fa8305ea449]
/usr/lib/libclassad.so.7(_ZN7classad18CachedExprEnvelopeD1Ev+0x7a)[0x7fa8305e9baa]
/usr/lib/libclassad.so.7(_ZN7classad18CachedExprEnvelopeD0Ev+0x9)[0x7fa8305e9bf9]
/usr/lib/libclassad.so.7(_ZN7classad7ClassAd5ClearEv+0x2f)[0x7fa8305e3a2f]
/usr/lib/libclassad.so.7(_ZN7classad7ClassAdD1Ev+0x24)[0x7fa8305e3ab4]
/usr/lib/condor/libcondor_utils_8_2_2.so(_ZN14compat_classad7ClassAdD0Ev+0x9)[0x7fa830970f09]
condor_schedd(_ZN18DedicatedScheduler13sortResourcesEv+0x16f)[0x4e597f]
condor_schedd(_ZN18DedicatedScheduler19handleDedicatedJobsEv+0xa3)[0x4e5ef3]
/usr/lib/condor/libcondor_utils_8_2_2.so(_ZN12TimerManager7TimeoutEPiPd+0x15a)[0x7fa830aaa49a]
/usr/lib/condor/libcondor_utils_8_2_2.so(_ZN10DaemonCore6DriverEv+0x7c3)[0x7fa830ac71d3]
/usr/lib/condor/libcondor_utils_8_2_2.so(_Z7dc_mainiPPc+0x12fd)[0x7fa830ad682d]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd)[0x7fa82c0e5eed]
condor_schedd[0x451b31]
14-09-25_12:26:40 (pid:17198) Setting maximum file descriptors to 4096.
14-09-25_12:26:40 (pid:17198) ******************************************************
14-09-25_12:26:40 (pid:17198) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
14-09-25_12:26:40 (pid:17198) ** /usr/sbin/condor_schedd

This might be something for the developers...

Thanks, S


-- 
Steffen Grunewald * Cluster Admin * steffen.grunewald(*)aei.mpg.de
MPI f. Gravitationsphysik (AEI) * Am Mühlenberg 1, D-14476 Potsdam
http://www.aei.mpg.de/ * ------- * +49-331-567-{fon:7274,fax:7298}