[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Error executing condor_history remotely (version 8.5.5)



Hi John,

Yes, just to confirm, when I run condor_history using -limit, it works, but the core dump file and the log 2> files are generated in the schedd.

# condor_history -n ce13.pic.es -limit 3
 ID     OWNER          SUBMITTED   RUN_TIME     ST COMPLETED   CMD           
   4.0   maprd009        6/29 19:03   0+03:33:52 X         ???  test.sh       
   3.0   maprd009        6/29 10:41   0+11:03:51 X         ???  test.sh       
   2.0   dteam004        6/27 11:58   0+04:00:01 C   6/27 16:03 test.sh       

-rw-r--r-- 1 condor condor 1.6K Jul  1 09:52 2>
-rw------- 1 root   root   904K Jul  1 09:52 core

# cat 2\>
Stack dump for process 309879 at timestamp 1467359196 (13 frames)
/usr/lib64/libcondor_utils_8_5_5.so(dprintf_dump_stack+0x12d)[0x39de388add]
/usr/lib64/libcondor_utils_8_5_5.so(_Z18linux_sig_coredumpi+0x40)[0x39de489bd0]
/lib64/libpthread.so.0[0x3baa60f7e0]
/lib64/libc.so.6(gsignal+0x35)[0x3baa232625]
/lib64/libc.so.6(abort+0x175)[0x3baa233e05]
/lib64/libc.so.6[0x3baa270537]
/lib64/libc.so.6[0x3baa275f4e]
/lib64/libc.so.6[0x3baa278cf0]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN10StringList13deleteCurrentEv+0x1f)[0x39de38548f]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN10StringList8clearAllEv+0x23)[0x39de3854e3]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN10StringListD1Ev+0x1b)[0x39de385c6b]
/lib64/libc.so.6(__cxa_finalize+0x9d)[0x3baa235ebd]
/usr/lib64/libcondor_utils_8_5_5.so[0x39de2f0dc6]
Stack dump for process 310091 at timestamp 1467359566 (13 frames)
/usr/lib64/libcondor_utils_8_5_5.so(dprintf_dump_stack+0x12d)[0x39de388add]
/usr/lib64/libcondor_utils_8_5_5.so(_Z18linux_sig_coredumpi+0x40)[0x39de489bd0]
/lib64/libpthread.so.0[0x3baa60f7e0]
/lib64/libc.so.6(gsignal+0x35)[0x3baa232625]
/lib64/libc.so.6(abort+0x175)[0x3baa233e05]
/lib64/libc.so.6[0x3baa270537]
/lib64/libc.so.6[0x3baa275f4e]
/lib64/libc.so.6[0x3baa278cf0]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN10StringList13deleteCurrentEv+0x1f)[0x39de38548f]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN10StringList8clearAllEv+0x23)[0x39de3854e3]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN10StringListD1Ev+0x1b)[0x39de385c6b]
/lib64/libc.so.6(__cxa_finalize+0x9d)[0x3baa235ebd]
/usr/lib64/libcondor_utils_8_5_5.so[0x39de2f0dc6]

Thank you.

Cheers,

Carles

On 06/30/2016 09:02 PM, John M Knoeller wrote:

I did some digging, and on 8.5.5 (and probably 8.4.7 also).  If you do a remote history without –limit.  Then you do indeed end up with a core file and a log called 2>.  The

History_helper appears to be crashing while destructing a static object.

 

So this will be fixed by the 8.4.8 and 8.5.6 releases.

 

-tj

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of John M Knoeller
Sent: Thursday, June 30, 2016 1:12 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Error executing condor_history remotely (version 8.5.5)

 

There is a known bug in 8.4.7 and 8.5.5 where the history helper will abort unless you pass  the -limit argument to condor_history. 

 

That bug is fixed in 8.4.8 and the upcoming 8.5.6 release.  But I don’t think that bug would results in a core dump  in the history helper.

 

Could you trying running a remote condor_history with –limit and see if you still get a core dump?

 

-tj

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Carles Acosta
Sent: Thursday, June 30, 2016 11:36 AM
To: htcondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Error executing condor_history remotely (version 8.5.5)

 

Dear all,

I'm using version 8.5.5 in a dual-stack pool. I'm not sure if this is a known issue or bug, but I've discovered that I cannot run condor_history remotely since the update of the SCHEDDs to 8.5.5.

When I execute condor_history remotely, I obtain:

# condor_history -n ce13.pic.es
 ID     OWNER          SUBMITTED   RUN_TIME     ST COMPLETED   CMD           
Failed to recieve remote ad.

Going to the SCHEDD, ce13.pic.es, in the logs, I see a core dump file and a file "2>":

-rw-r--r-- 1 condor condor    1728 Jun 30 18:00 2>
-rw------- 1 root   root    643072 Jun 30 18:00 core.HISTORY_HELPER  
 

In the logs, I see a core dump file and a file "2>":

[root@ce13 condor]# cat 2\>
Stack dump for process 189325 at timestamp 1467212911 (6 frames)
/usr/lib64/libcondor_utils_8_5_5.so(dprintf_dump_stack+0x12d)[0x3bad388add]
/usr/lib64/libcondor_utils_8_5_5.so(_Z18linux_sig_coredumpi+0x40)[0x3bad489bd0]
/lib64/libpthread.so.0[0x3baa60f7e0]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN4ListIcED1Ev+0x27)[0x3bad386777]
/lib64/libc.so.6(__cxa_finalize+0x9d)[0x3baa235ebd]
/usr/lib64/libcondor_utils_8_5_5.so[0x3bad2f0dc6]
Stack dump for process 189835 at timestamp 1467213844 (6 frames)
/usr/lib64/libcondor_utils_8_5_5.so(dprintf_dump_stack+0x12d)[0x3bad388add]
/usr/lib64/libcondor_utils_8_5_5.so(_Z18linux_sig_coredumpi+0x40)[0x3bad489bd0]
/lib64/libpthread.so.0[0x3baa60f7e0]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN4ListIcED1Ev+0x27)[0x3bad386777]
/lib64/libc.so.6(__cxa_finalize+0x9d)[0x3baa235ebd]
/usr/lib64/libcondor_utils_8_5_5.so[0x3bad2f0dc6]
Stack dump for process 195756 at timestamp 1467220135 (6 frames)
/usr/lib64/libcondor_utils_8_5_5.so(dprintf_dump_stack+0x12d)[0x3bad388add]
/usr/lib64/libcondor_utils_8_5_5.so(_Z18linux_sig_coredumpi+0x40)[0x3bad489bd0]
/lib64/libpthread.so.0[0x3baa60f7e0]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN4ListIcED1Ev+0x27)[0x3bad386777]
/lib64/libc.so.6(__cxa_finalize+0x9d)[0x3baa235ebd]
/usr/lib64/libcondor_utils_8_5_5.so[0x3bad2f0dc6]
Stack dump for process 262508 at timestamp 1467302411 (6 frames)
/usr/lib64/libcondor_utils_8_5_5.so(dprintf_dump_stack+0x12d)[0x3bad388add]
/usr/lib64/libcondor_utils_8_5_5.so(_Z18linux_sig_coredumpi+0x40)[0x3bad489bd0]
/lib64/libpthread.so.0[0x3baa60f7e0]
/usr/lib64/libcondor_utils_8_5_5.so(_ZN4ListIcED1Ev+0x27)[0x3bad386777]
/lib64/libc.so.6(__cxa_finalize+0x9d)[0x3baa235ebd]
/usr/lib64/libcondor_utils_8_5_5.so[0x3bad2f0dc6]


On the other hand, I can run condor_history without problems locally:

[root@ce13 ~]# condor_history
 ID     OWNER          SUBMITTED   RUN_TIME     ST COMPLETED   CMD           
   4.0   maprd009        6/29 19:03   0+03:33:52 X         ???  test.sh       
   3.0   maprd009        6/29 10:41   0+11:03:51 X         ???  test.sh       
   2.0   dteam004        6/27 11:58   0+04:00:01 C   6/27 16:03 test.sh       
   1.0   dteam004        6/27 11:50   0+04:00:03 C   6/27 15:51 test.sh   


Downgrading the SCHEDD to 8.5.4, condor_history starts to work remotely again.

# condor_history -n ce13.pic.es
 ID     OWNER          SUBMITTED   RUN_TIME     ST COMPLETED   CMD           
   4.0   maprd009        6/29 19:03   0+03:33:52 X         ???  test.sh       
   3.0   maprd009        6/29 10:41   0+11:03:51 X         ???  test.sh       
   2.0   dteam004        6/27 11:58   0+04:00:01 C   6/27 16:03 test.sh       
   1.0   dteam004        6/27 11:50   0+04:00:03 C   6/27 15:51 test.sh    
 

Thanks in advance.

Cheers,

Carles

-- 
Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 22
Fax: +34 93 581 41 10
http://www.pic.es 
Avís - Aviso - Legal Notice: http://www.ifae.es/legal.html


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


-- 
Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 22
Fax: +34 93 581 41 10
http://www.pic.es 
Avís - Aviso - Legal Notice: http://www.ifae.es/legal.html