[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] StartLog: Failed to authenticate



I meant to include the execute node condor_who -daemon:

Daemon       Alive  PID    PPID   Exit
------       -----  ---    ----   ----
Master       yes    7570   1      no
SharedPort   no     7604   no     no
Startd       yes    7605   7570   no

JK



> On Aug 18, 2023, at 3:38 PM, Justin Killebrew via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>
>
>      External Email - Use Caution
>
>
>
> condor_who -daemons  on the central manager (also configured as submit role) shows:
>
> Daemon       Alive  PID    PPID   Exit
> ------       -----  ---    ----   ----
> Collector    yes    1608   1494   no
> Master       yes    1494   1      no
> Negotiator   yes    1609   1494   no
> Schedd       yes    1610   1494   no
> SharedPort   yes    1607   1494   no
>
> This looks correct but on the execute machine, StartLog has several
> ERROR: AUTHENTICATE:1003:Failed to authenticate with any method
> and
> SECMAN: required authentication with collector failed
>
> The central manager CollectorLog shows similar errors:
> DC_AUTHENTICATE: required authentication of 192.168.1.5 failed
>
> The firewall isnât active â Where else should I look?
>
> condor_status returns nothing on the central manager.  Is this because it doesnât see any execute machines?
>
>
> Thanks,
> JK
>
>
>
>> On Aug 17, 2023, at 12:28 PM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
>>
>>
>>     External Email - Use Caution
>>
>>
>>
>> One way to troubleshoot is to run
>>
>>  condor_who -daemons
>>
>> On the execute node.  This tool scrapes log files to determine which daemons are alive and which are not.
>>
>> If the condor_master is running, then you can use
>>
>>  condor_who -quick
>>
>> which sends a query to the condor_master about the state of the other daemons.
>>
>> -tj
>>
>> -----Original Message-----
>> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Justin Killebrew via HTCondor-users
>> Sent: Friday, August 11, 2023 3:03 PM
>> To: Todd L Miller <tlmiller@xxxxxxxxxxx>
>> Cc: Justin Killebrew <jk@xxxxxxx>; Justin Killebrew via HTCondor-users <htcondor-users@xxxxxxxxxxx>
>> Subject: Re: [HTCondor-users] condor_status returns nothing
>>
>> The StartLog showed that /var/lib/condor/execute didnât exist.  I created it and restarted condor and now condor_status works as expected.
>>
>> Thanks!
>>
>> JK
>>
>>
>>> On Aug 11, 2023, at 3:47 PM, Todd L Miller <tlmiller@xxxxxxxxxxx> wrote:
>>>
>>>
>>>   External Email - Use Caution
>>>
>>>
>>>
>>>> Should there be a startd running?  How do I troubleshoot this installation?
>>>
>>>     Yes.  First thing to do is look at the MasterLog and StartLog
>>> files (which will probably be in /var/log/condor, but you can run
>>> `condor_config_val LOG` to find out for sure).  From your process tree, it
>>> looks like either the master isn't starting the startd or the startd is
>>> crashing (almost?) immediately on start-up.
>>>
>>> - ToddM
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/