[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] problems with htcondor-ce 3.2.1-1 + condor 8.8.1



Does `condor_status -schedd -pool htc-2.cr.cnaf.infn.it` succeed from 
the old CE but fail from ce02? I'd be surprised if anything worked since 
`condor_status -schedd` from the central manager isn't working!

Is READ access to the collector restricted? Running `condor_ping 
-verbose -type collector READ` from the CE host would give you a good 
idea of the required permissions. However, I'm just realizing we don't 
have a 'condor_ce_store_cred' [1], so the instructions for setting up 
password auth [2] won't work on the CE side.

- Brian

[1] https://github.com/opensciencegrid/htcondor-ce/pull/218

[2] 
http://research.cs.wisc.edu/htcondor/manual/v8.8/Security.html#x36-2780003.8.3


On 3/11/19 11:53 AM, Stefano Dal Pra wrote:
> On 11/03/19 15:45, Brian Lin wrote:
>> That's curious, do you
>> see any errors in /etc/condor/CollectorLog on
>> htc-2.cr.cnaf.infn.it?
> Yes, see below.
>> What's `condor_config_val COLLECTOR_HOST` return
> [root@htc-2 condor]# condor_config_val COLLECTOR_HOST
> htc-2.cr.cnaf.infn.it
>
>> on the CE? How about `condor_status -schedd` on the central manager?
> #this very moment the cluster is quite screwed and the CM does not 
> start. (CEDAR:6001:Failed to connect to <131.154.195.32:9618>)
> (downgraded and upgraded again, neutralizing configurations from 
> puppet classes. )
>>
>> Thanks,
>> Brian
>
>
>
> I raised log verbosity; my understanding (see logs below) is that the 
> JobRouter at ce02-htc fails to authenticate with CM at htc-2
> because it attempts FS method, which fails because they have no common 
> filesystem.
> The SEC_*AUTHENTICATION_METHODS (and most of other settings) seems to 
> be equivalent with the other cluster.
> I tried adding the PASSWORD method: SEC_*_AUTHENTICATION_METHODS = 
> ..., PASSWORD
> but it didn't work; maybe i missed the right combination, though.
>
> The IP in the logs are:
> (131.154.195.32 == htc-2.cr.cnaf.infn.it)
> (131.154.192.41 == ce02-htc.cr.cnaf.infn.it)
>
> From JobRouterLog at ce02-htc:
>
> 03/11/19 07:13:28 (D_ALWAYS:2) Will use TCP to update collector 
> htc-2.cr.cnaf.infn.it <131.154.195.32:9618>
> 03/11/19 07:13:28 (D_ALWAYS:2) Trying to query collector 
> <131.154.195.32:9618>
> 03/11/19 07:13:28 (D_ALWAYS) SECMAN: required authentication with 
> collector at <131.154.195.32:9618> failed, so aborting command 
> QUERY_SCHEDD_ADS.
> 03/11/19 07:13:28 (D_ALWAYS) ERROR: AUTHENTICATE:1003:Failed to 
> authenticate with any method|AUTHENTICATE:1004:Failed to authenticate 
> using FS
> 03/11/19 07:13:28 (D_ALWAYS) ERROR (pool htc-2.cr.cnaf.infn.it:9618) 
> Can't find address of schedd
> 03/11/19 07:13:28 (D_ALWAYS) JobRouter failure 
> (src=320.0,route=condor_pool_cms): failed to submit job
>
> CollectorLog at htc-2.cr.cnaf.infn.it:
>
> 03/11/19 07:13:39 SECMAN: new session, doing initial authentication.
> 03/11/19 07:13:39 Returning to DC while we wait for socket to 
> authenticate.
> 03/11/19 07:13:39 AUTHENTICATE: setting timeout for (unknown) to 20.
> 03/11/19 07:13:39 HANDSHAKE: in handshake(my_methods = 'FS')
> 03/11/19 07:13:39 HANDSHAKE: handshake() - i am the server
> 03/11/19 07:13:39 HANDSHAKE: client sent (methods == 4)
> 03/11/19 07:13:39 HANDSHAKE: i picked (method == 4)
> 03/11/19 07:13:39 HANDSHAKE: client received (method == 4)
> 03/11/19 07:13:39 FS: client template is /tmp/FS_XXXXXXXXX
> 03/11/19 07:13:39 FS: client filename is /tmp/FS_XXXU3AGXf
> 03/11/19 07:13:39 Will return to DC because authentication is incomplete.
> 03/11/19 07:13:39 AUTHENTICATE_FS: used dir /tmp/FS_XXXU3AGXf, status: 0
> 03/11/19 07:13:39 AUTHENTICATE: method -1 (FS) failed.
> 03/11/19 07:13:39 HANDSHAKE: in handshake(my_methods = 'FS')
> 03/11/19 07:13:39 AUTHENTICATE: handshake would block
> 03/11/19 07:13:39 Will return to DC to continue authentication..
> 03/11/19 07:13:39 HANDSHAKE: handshake() - i am the server
> 03/11/19 07:13:39 HANDSHAKE: client sent (methods == 0)
> 03/11/19 07:13:39 HANDSHAKE: i picked (method == 0)
> 03/11/19 07:13:39 HANDSHAKE: client received (method == 0)
> 03/11/19 07:13:39 DC_AUTHENTICATE: required authentication of 
> 131.154.192.41 failed: AUTHENTICATE:1003:Failed to authenticate with 
> any method|AUTHENT
> ICATE:1004:Failed to authenticate using FS|FS:1004:Unable to 
> lstat(/tmp/FS_XXXU3AGXf)
> 03/11/19 07:13:39 DC_AUTHENTICATE: received DC_AUTHENTICATE from 
> <131.154.192.41:12036>
> 03/11/19 07:13:39 DC_AUTHENTICATE: generating BLOWFISH key for session 
> htc-2:13943:1552284819:2284...
>
>
>
> Thanks for your help
> Stefano
>