[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Error executing condor_history remotely (version 8.5.5)



I did some digging, and on 8.5.5 (and probably 8.4.7 also).  If you do a remote history without âlimit.  Then you do indeed end up with a core file and a log called 2>.  The

History_helper appears to be crashing while destructing a static object.

 

So this will be fixed by the 8.4.8 and 8.5.6 releases.

 

-tj

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of John M Knoeller
Sent: Thursday, June 30, 2016 1:12 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Error executing condor_history remotely (version 8.5.5)

 

There is a known bug in 8.4.7 and 8.5.5 where the history helper will abort unless you pass  the -limit argument to condor_history. 

 

That bug is fixed in 8.4.8 and the upcoming 8.5.6 release.  But I donât think that bug would results in a core dump  in the history helper.

 

Could you trying running a remote condor_history with âlimit and see if you still get a core dump?

 

-tj

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Carles Acosta
Sent: Thursday, June 30, 2016 11:36 AM
To: htcondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Error executing condor_history remotely (version 8.5.5)

 

Dear all,

I'm using version 8.5.5 in a dual-stack pool. I'm not sure if this is a known issue or bug, but I've discovered that I cannot run condor_history remotely since the update of the SCHEDDs to 8.5.5.

When I execute condor_history remotely, I obtain:

# condor_history -n ce13.pic.es
 ID     OWNER          SUBMITTED   RUN_TIME     ST COMPLETED   CMD           
Failed to recieve remote ad.

Going to the SCHEDD, ce13.pic.es, in the logs, I see a core dump file and a file "2>":

-rw-r--r-- 1 condor condor    1728 Jun 30 18:00 2>
-rw------- 1 root   root    643072 Jun 30 18:00 core.HISTORY_HELPER  
 

In the logs, I see a core dump file and a file "2>":

[root@ce13 condor]# cat 2\>
Stack dump for process 189325 at timestamp 1467212911 (6 frames)
/usr/lib64/libcondor_utils_8_5_5.so(dprintf_dump_stack+0x12d)[0x3bad388add]
/usr/lib64/libcondor_utils_8_5_5.so(_Z18linux_sig_coredumpi+0x40)[0x3bad489bd0]
/lib64/libpthread.so.0[0x3baa60f7e0]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN4ListIcED1Ev+0x27)[0x3bad386777]
/lib64/libc.so.6(__cxa_finalize+0x9d)[0x3baa235ebd]
/usr/lib64/libcondor_utils_8_5_5.so[0x3bad2f0dc6]
Stack dump for process 189835 at timestamp 1467213844 (6 frames)
/usr/lib64/libcondor_utils_8_5_5.so(dprintf_dump_stack+0x12d)[0x3bad388add]
/usr/lib64/libcondor_utils_8_5_5.so(_Z18linux_sig_coredumpi+0x40)[0x3bad489bd0]
/lib64/libpthread.so.0[0x3baa60f7e0]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN4ListIcED1Ev+0x27)[0x3bad386777]
/lib64/libc.so.6(__cxa_finalize+0x9d)[0x3baa235ebd]
/usr/lib64/libcondor_utils_8_5_5.so[0x3bad2f0dc6]
Stack dump for process 195756 at timestamp 1467220135 (6 frames)
/usr/lib64/libcondor_utils_8_5_5.so(dprintf_dump_stack+0x12d)[0x3bad388add]
/usr/lib64/libcondor_utils_8_5_5.so(_Z18linux_sig_coredumpi+0x40)[0x3bad489bd0]
/lib64/libpthread.so.0[0x3baa60f7e0]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN4ListIcED1Ev+0x27)[0x3bad386777]
/lib64/libc.so.6(__cxa_finalize+0x9d)[0x3baa235ebd]
/usr/lib64/libcondor_utils_8_5_5.so[0x3bad2f0dc6]
Stack dump for process 262508 at timestamp 1467302411 (6 frames)
/usr/lib64/libcondor_utils_8_5_5.so(dprintf_dump_stack+0x12d)[0x3bad388add]
/usr/lib64/libcondor_utils_8_5_5.so(_Z18linux_sig_coredumpi+0x40)[0x3bad489bd0]
/lib64/libpthread.so.0[0x3baa60f7e0]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN4ListIcED1Ev+0x27)[0x3bad386777]
/lib64/libc.so.6(__cxa_finalize+0x9d)[0x3baa235ebd]
/usr/lib64/libcondor_utils_8_5_5.so[0x3bad2f0dc6]


On the other hand, I can run condor_history without problems locally:

[root@ce13 ~]# condor_history
 ID     OWNER          SUBMITTED   RUN_TIME     ST COMPLETED   CMD           
   4.0   maprd009        6/29 19:03   0+03:33:52 X         ???  test.sh       
   3.0   maprd009        6/29 10:41   0+11:03:51 X         ???  test.sh       
   2.0   dteam004        6/27 11:58   0+04:00:01 C   6/27 16:03 test.sh       
   1.0   dteam004        6/27 11:50   0+04:00:03 C   6/27 15:51 test.sh   


Downgrading the SCHEDD to 8.5.4, condor_history starts to work remotely again.

# condor_history -n ce13.pic.es
 ID     OWNER          SUBMITTED   RUN_TIME     ST COMPLETED   CMD           
   4.0   maprd009        6/29 19:03   0+03:33:52 X         ???  test.sh       
   3.0   maprd009        6/29 10:41   0+11:03:51 X         ???  test.sh       
   2.0   dteam004        6/27 11:58   0+04:00:01 C   6/27 16:03 test.sh       
   1.0   dteam004        6/27 11:50   0+04:00:03 C   6/27 15:51 test.sh    
 

Thanks in advance.

Cheers,

Carles

-- 
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 22
Fax: +34 93 581 41 10
http://www.pic.es 
AvÃs - Aviso - Legal Notice: http://www.ifae.es/legal.html