[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CREDD_POLLING_TIMEOUT infinite when not set ?



argg,

I owe you but for my excuse how can it be that I look into this today while the rotated logs ends nearly exactly at that time & date in 2017 - hard to not fall for conspiracy theories here :( 

Also we were running in debug mode quite a while and the start-up logs now are so small that there are 2 years of logs in the non-rotated file - mor confusion for my tiny brain :( 

As for my other problem then, it is the credd_poll_timeout that was not delivered to the worker.conf by us and is stuck at 20 sec. I found the corresponding logs which lead to this in the shadowlog then: 

/var/log/condor/ShadowLog:11/21/19 11:54:16 (2818641.296) (3033094): Request to run on slot1_14@xxxxxxxxxxxxxxx <131.169.163.103:43797?addrs=131.169.163.103-43797> was DELAYED (previous job still being vacated)
/var/log/condor/ShadowLog:11/21/19 11:54:18 (2818641.296) (3033094): Request to run on slot1_14@xxxxxxxxxxxxxxx <131.169.163.103:43797?addrs=131.169.163.103-43797> was REFUSED
/var/log/condor/ShadowLog:11/21/19 11:54:18 (2818641.296) (3033094): Job 2818641.296 is being evicted from slot1_14@xxxxxxxxxxxxxxx
/var/log/condor/ShadowLog:11/21/19 11:54:18 (2818641.296) (3033094): logEvictEvent with unknown reason (108), not logging.
/var/log/condor/ShadowLog:11/21/19 11:54:18 (2818641.296) (3033094): **** condor_shadow (condor_SHADOW) pid 3033094 EXITING WITH STATUS 108
/var/log/condor/ShadowLog:11/21/19 11:54:19 (2818641.255) (3032118): Switching to new job 2818641.296
/var/log/condor/ShadowLog:11/21/19 11:54:19 (?.?) (3032118): Initializing a VANILLA shadow for job 2818641.296

Will roll out the credd_polling_timeout tonight and hope to see better performance next week and lesser restarts. 

Have a great weekend and thanks a lot !!!! 



-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Zach Miller" <zmiller@xxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Freitag, 22. November 2019 19:37:55
Betreff: Re: [HTCondor-users] CREDD_POLLING_TIMEOUT infinite when not set ?

Cristoph,

As per the other thread, the date in the log entry was accurate.  Those messages really were from 2017 because there was a bug that failed to decrement a counter!

Ticket for that bug here:     https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6523

In January 18 of 2018, that bug was fixed.  That first went out in 8.7.7.

You are fortunately running 8.7.8 so I think you shouldn't be seeing this anymore... it was just old logs.  Please double check the end of the recent logs and you can see that things appear to working as expected.  (Or else let me know I am wrong and I will owe you a tasty beverage. :)
 

Cheers,
-zach


ïOn 11/22/19, 9:58 AM, "HTCondor-users on behalf of Zach Miller" <htcondor-users-bounces@xxxxxxxxxxx on behalf of zmiller@xxxxxxxxxxx> wrote:

    Hi Cristoph,
    
    I did a quick peek at the code and this is not expected behavior, or at least, not reasonable.  : )
    
    Let me investigate further and I will create a ticket if needed.
    
    
    Cheers,
    -zach
    
    
    On 11/22/19, 9:02 AM, "HTCondor-users on behalf of Beyer, Christoph" <htcondor-users-bounces@xxxxxxxxxxx on behalf of christoph.beyer@xxxxxxx> wrote:
    
        Hi,
        
        I just found some workernodes with the CREDD_POLLING_TIMEOUT not set and on those the credd waits infinite for days for the credential to appear which blocks the startd as well:  
        
        Don't worry about the 17, different problem, this is recent (from today): 
        
        
        11/22/17 02:25:52 (pid:50892) CREDMON: warning, got errno 2, waiting for /var/lib/condor/credential/kemp.cc to appear (retry: 20)
        11/22/17 02:25:53 (pid:50892) CREDMON: warning, got errno 2, waiting for /var/lib/condor/credential/kemp.cc to appear (retry: 20)
        11/22/17 02:25:54 (pid:50892) CREDMON: warning, got errno 2, waiting for /var/lib/condor/credential/kemp.cc to appear (retry: 20)
        11/22/17 02:25:55 (pid:50892) CREDMON: warning, got errno 2, waiting for /var/lib/condor/credential/kemp.cc to appear (retry: 20)
        11/22/17 02:25:56 (pid:50892) CREDMON: warning, got errno 2, waiting for /var/lib/condor/credential/kemp.cc to appear (retry: 20)
        11/22/17 02:25:57 (pid:50892) CREDMON: warning, got errno 2, waiting for /var/lib/condor/credential/kemp.cc to appear (retry: 20)
        <snip>
        11/23/17 00:51:39 (pid:50892) CREDMON: warning, got errno 2, waiting for /var/lib/condor/credential/kemp.cc to appear (retry: 20)
        11/23/17 00:51:40 (pid:50892) CREDMON: warning, got errno 2, waiting for /var/lib/condor/credential/kemp.cc to appear (retry: 20)
        11/23/17 00:51:41 (pid:50892) CREDMON: warning, got errno 2, waiting for /var/lib/condor/credential/kemp.cc to appear (retry: 20)
        11/23/17 00:51:42 (pid:50892) CREDMON: warning, got errno 2, waiting for /var/lib/condor/credential/kemp.cc to appear (retry: 20)
        
        This is on: 
        
        [root@bird664 condor]# condor_master -v
        $CondorVersion: 8.7.8 May 31 2018 BuildID: 442130 $
        $CondorPlatform: x86_64_RedHat6 $
        
        
        Best
        Christoph
        
        -- 
        Christoph Beyer
        DESY Hamburg
        IT-Department
        
        Notkestr. 85
        Building 02b, Room 009
        22607 Hamburg
        
        phone:+49-(0)40-8998-2317
        mail: christoph.beyer@xxxxxxx
        _______________________________________________
        HTCondor-users mailing list
        To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
        subject: Unsubscribe
        You can also unsubscribe by visiting
        https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
        
        The archives can be found at:
        https://lists.cs.wisc.edu/archive/htcondor-users/
        
    
    
    _______________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
    
    The archives can be found at:
    https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/