[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Upgrade of HTCondor-CE from 5 to 6 broke my CE



Hi Jaime,
Please see the output bellow:
tau-htc ~]# condor_ce_q -pool tau-cm.hep.tau.ac.il:9618 -name tau-htc.hep.tau.ac.il -debug:D_SECURITY:2
03/01/24 04:51:16 KEYCACHE: created: 0x1fa8210
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission ALLOW
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission READ
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission WRITE
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission NEGOTIATOR
03/01/24 04:51:16 ipverify: NEGOTIATOR optimized to deny everyone
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission ADMINISTRATOR
03/01/24 04:51:16 ipverify: ADMINISTRATOR optimized to deny everyone
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission CONFIG
03/01/24 04:51:16 ipverify: CONFIG optimized to deny everyone
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission DAEMON
03/01/24 04:51:16 ipverify: DAEMON optimized to deny everyone
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission SOAP
03/01/24 04:51:16 ipverify: SOAP optimized to deny everyone
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission DEFAULT
03/01/24 04:51:16 ipverify: DEFAULT optimized to deny everyone
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission CLIENT
03/01/24 04:51:16 IPVERIFY: allow CLIENT: * (from config value ALLOW_CLIENT)
03/01/24 04:51:16 IPVERIFY: deny CLIENT: anonymous@*, unmapped@* (from config value DENY_CLIENT)
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission ADVERTISE_STARTD
03/01/24 04:51:16 ipverify: ADVERTISE_STARTD optimized to deny everyone
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission ADVERTISE_SCHEDD
03/01/24 04:51:16 ipverify: ADVERTISE_SCHEDD optimized to deny everyone
03/01/24 04:51:16 IPVERIFY: Subsystem TOOL
03/01/24 04:51:16 IPVERIFY: Permission ADVERTISE_MASTER
03/01/24 04:51:16 ipverify: ADVERTISE_MASTER optimized to deny everyone
03/01/24 04:51:16 Initialized the following authorization table:
03/01/24 04:51:16 Authorizations yet to be resolved:
03/01/24 04:51:16 deny CLIENT: Âanonymous@*/* unmapped@*/*
03/01/24 04:51:16 SECMAN: command 6 QUERY_SCHEDD_ADS to collector at <192.114.100.129:9618> from TCP port 24398 (blocking).
03/01/24 04:51:16 Filtering authentication methods (FS,TOKEN,SCITOKENS,SSL,IDTOKENS,PASSWORD) prior to offering them remotely.
03/01/24 04:51:16 Can try token auth because we have at least one named credential.
03/01/24 04:51:16 Will try IDTOKENS auth.
03/01/24 04:51:16 Can try token auth because we have at least one named credential.
03/01/24 04:51:16 Will try IDTOKENS auth.
03/01/24 04:51:16 Inserting pre-auth metadata for TOKEN.
03/01/24 04:51:16 Inserting pre-auth metadata for TOKEN.
03/01/24 04:51:16 SECMAN: no cached key for {<192.114.100.129:9618?alias=tau-cm.hep.tau.ac.il>,<6>}.
03/01/24 04:51:16 SECMAN: Security Policy:
AuthMethods = "FS,TOKEN,SCITOKENS,SSL,TOKEN,PASSWORD"
Authentication = "REQUIRED"
CryptoMethods = "AES,BLOWFISH,3DES"
ECDHPublicKey = "BFg9bf3LhfTFHhABkjSvHlpR7Zu9hyg5fkMDfldGaeFppyl/DGhjdvZmu7piW4bvxrmfwkiPxEKw1pC1DUxK+qY="
Enact = "NO"
Encryption = "REQUIRED"
Integrity = "REQUIRED"
IssuerKeys = "POOL"
NegotiatedSession = true
NewSession = "YES"
OutgoingNegotiation = "REQUIRED"
ServerPid = 943148
SessionDuration = "60"
SessionLease = 3600
Subsystem = "TOOL"
TrustDomain = "users.htcondor.org"
03/01/24 04:51:16 SECMAN: negotiating security for command 6.
03/01/24 04:51:16 SECMAN: sending DC_AUTHENTICATE command
03/01/24 04:51:16 SECMAN: sending following classad:
AuthMethods = "FS,TOKEN,SCITOKENS,SSL,TOKEN,PASSWORD"
Authentication = "REQUIRED"
Command = 6
ConnectSinful = "<192.114.100.129:9618?alias=tau-cm.hep.tau.ac.il>"
CryptoMethods = "AES,BLOWFISH,3DES"
ECDHPublicKey = "BFg9bf3LhfTFHhABkjSvHlpR7Zu9hyg5fkMDfldGaeFppyl/DGhjdvZmu7piW4bvxrmfwkiPxEKw1pC1DUxK+qY="
Enact = "NO"
Encryption = "REQUIRED"
Integrity = "REQUIRED"
IssuerKeys = "POOL"
NegotiatedSession = true
NewSession = "YES"
OutgoingNegotiation = "REQUIRED"
RemoteVersion = "$CondorVersion: 10.9.0 2023-09-28 BuildID: 678228 PackageID: 10.9.0-1 $"
ServerPid = 943148
SessionDuration = "60"
SessionLease = 3600
Subsystem = "TOOL"
TrustDomain = "users.htcondor.org"
03/01/24 04:51:16 SECMAN: server responded with:
AuthMethods = "FS"
AuthMethodsList = "FS,TOKEN,TOKEN,SCITOKENS,SSL"
Authentication = "YES"
CryptoMethods = "AES"
CryptoMethodsList = "AES,BLOWFISH,3DES"
ECDHPublicKey = "BJ7lTON+wfXCPQrChtgWop2nBDpJ2ECeaRbRaqxsoBKjcelGqKWeYKi7VEh8UCC2D9UGW1sb+pXXAHhgtoLyyXw="
Enact = "YES"
Encryption = "YES"
Integrity = "YES"
IssuerKeys = "POOL, POOL.puppet-bak"
NegotiatedSession = true
RemoteVersion = "$CondorVersion: 10.9.0 2023-09-28 BuildID: 678228 PackageID: 10.9.0-1 $"
SessionDuration = "60"
SessionLease = 3600
TrustDomain = "hep.tau.ac.il"
03/01/24 04:51:16 SECMAN: new session, doing initial authentication.
03/01/24 04:51:16 SECMAN: authenticating RIGHT NOW.
03/01/24 04:51:16 SECMAN: AuthMethodsList: FS,TOKEN,TOKEN,SCITOKENS,SSL
03/01/24 04:51:16 SECMAN: Auth methods: FS,TOKEN,TOKEN,SCITOKENS,SSL
03/01/24 04:51:16 AUTHENTICATE: setting timeout for <192.114.100.129:9618?alias=tau-cm.hep.tau.ac.il> to 20.
03/01/24 04:51:16 AUTHENTICATE: in authenticate( addr == '<192.114.100.129:9618?alias=tau-cm.hep.tau.ac.il>', methods == 'FS,TOKEN,TOKEN,SCITOKENS,SSL')
03/01/24 04:51:16 AUTHENTICATE: can still try these methods: FS,TOKEN,TOKEN,SCITOKENS,SSL
03/01/24 04:51:16 HANDSHAKE: in handshake(my_methods = 'FS,TOKEN,TOKEN,SCITOKENS,SSL')
03/01/24 04:51:16 HANDSHAKE: handshake() - i am the client
03/01/24 04:51:16 Setting SciTokens cache directory to /var/run/condor-ce/cache
03/01/24 04:51:16 HANDSHAKE: sending (methods == 6404) to server
03/01/24 04:51:16 HANDSHAKE: server replied (method = 4)
03/01/24 04:51:16 AUTHENTICATE: will try to use 4 (FS)
03/01/24 04:51:16 AUTHENTICATE: do_authenticate is 1.
03/01/24 04:51:16 AUTHENTICATE_FS: used dir /tmp/FS_XXXgveiLN, status: 0
03/01/24 04:51:16 AUTHENTICATE: method 4 (FS) failed.
03/01/24 04:51:16 AUTHENTICATE: can still try these methods: TOKEN,TOKEN,SCITOKENS,SSL
03/01/24 04:51:16 HANDSHAKE: in handshake(my_methods = 'TOKEN,TOKEN,SCITOKENS,SSL')
03/01/24 04:51:16 HANDSHAKE: handshake() - i am the client
03/01/24 04:51:16 HANDSHAKE: sending (methods == 6400) to server
03/01/24 04:51:16 HANDSHAKE: server replied (method = 2048)
03/01/24 04:51:16 Will use issuer hep.tau.ac.il for remote server.
03/01/24 04:51:16 AUTHENTICATE: will try to use 2048 (IDTOKENS)
03/01/24 04:51:16 AUTHENTICATE: do_authenticate is 1.
03/01/24 04:51:16 PW.
03/01/24 04:51:16 PW: getting name.
03/01/24 04:51:16 Looking for tokens in directory /etc/condor-ce/tokens.d for issuer hep.tau.ac.il
03/01/24 04:51:16 TOKEN: No token found.
03/01/24 04:51:16 PW: Failed to fetch a login name
03/01/24 04:51:16 PW: Generating ra.
03/01/24 04:51:16 PW: Client sending.
03/01/24 04:51:16 Client error: NULL in send?
03/01/24 04:51:16 Client sending: -1, 0(), 0
03/01/24 04:51:16 PW: Client receiving.
03/01/24 04:51:16 Server sent status indicating not OK.
03/01/24 04:51:16 PW: Client received ERROR from server, propagating
03/01/24 04:51:16 PW: CLient sending two.
03/01/24 04:51:16 In client_send_two.
03/01/24 04:51:16 Client error: don't know my own name?
03/01/24 04:51:16 Can't send null for random string.
03/01/24 04:51:16 Client error: I have no name?
03/01/24 04:51:16 Client sending: 0() 0 0
03/01/24 04:51:16 Sent ok.
03/01/24 04:51:16 AUTHENTICATE: method 2048 (IDTOKENS) failed.
03/01/24 04:51:16 AUTHENTICATE: can still try these methods: SCITOKENS,SSL
03/01/24 04:51:16 HANDSHAKE: in handshake(my_methods = 'SCITOKENS,SSL')
03/01/24 04:51:16 HANDSHAKE: handshake() - i am the client
03/01/24 04:51:16 HANDSHAKE: sending (methods == 4352) to server
03/01/24 04:51:16 HANDSHAKE: server replied (method = 4096)
03/01/24 04:51:16 AUTHENTICATE: will try to use 4096 (SCITOKENS)
03/01/24 04:51:16 AUTHENTICATE: do_authenticate is 1.
03/01/24 04:51:16 CAFILE: ÂÂÂÂ'/etc/pki/tls/certs/ca-bundle.crt'
03/01/24 04:51:16 CADIR: ÂÂÂÂÂ'/etc/grid-security/certificates'
03/01/24 04:51:16 CIPHERLIST: 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RS
A-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA
-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS'
03/01/24 04:51:16 SSL client host check: using host alias tau-cm.hep.tau.ac.il for peer 192.114.100.129
03/01/24 04:51:16 SSL Auth: No SciToken file provided
03/01/24 04:51:16 SSL Auth: SSL Authentication fails, terminating
03/01/24 04:51:16 AUTHENTICATE: method 4096 (SCITOKENS) failed.
03/01/24 04:51:16 AUTHENTICATE: can still try these methods: SSL
03/01/24 04:51:16 HANDSHAKE: in handshake(my_methods = 'SSL')
03/01/24 04:51:16 HANDSHAKE: handshake() - i am the client
03/01/24 04:51:16 HANDSHAKE: sending (methods == 256) to server
03/01/24 04:51:16 HANDSHAKE: server replied (method = 256)
03/01/24 04:51:16 AUTHENTICATE: will try to use 256 (SSL)
03/01/24 04:51:16 AUTHENTICATE: do_authenticate is 1.
03/01/24 04:51:16 CAFILE: ÂÂÂÂ'/etc/pki/tls/certs/ca-bundle.crt'
03/01/24 04:51:16 CADIR: ÂÂÂÂÂ'/etc/grid-security/certificates'
03/01/24 04:51:16 CERTFILE: ÂÂ'/etc/grid-security/hostcert.pem'
03/01/24 04:51:16 KEYFILE: ÂÂÂ'/etc/grid-security/hostkey.pem'
03/01/24 04:51:16 CIPHERLIST: 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RS
A-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA
-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS'
03/01/24 04:51:16 SSL client host check: using host alias tau-cm.hep.tau.ac.il for peer 192.114.100.129
03/01/24 04:51:16 SSL Auth: Trying to connect.
03/01/24 04:51:16 Tried to connect: -1
03/01/24 04:51:16 SSL Auth: SSL: trying to continue reading.
03/01/24 04:51:16 Round 1.
03/01/24 04:51:16 Send message (2).
03/01/24 04:51:16 Status (c: 2, s: 0)
03/01/24 04:51:16 SSL Auth: Trying to connect.
03/01/24 04:51:16 Tried to connect: -1
03/01/24 04:51:16 SSL Auth: SSL: trying to continue reading.
03/01/24 04:51:16 Round 2.
03/01/24 04:51:16 SSL Auth: Receive message.
03/01/24 04:51:16 Received message (2).
03/01/24 04:51:16 Status (c: 2, s: 2)
03/01/24 04:51:16 SSL Auth: Trying to connect.
03/01/24 04:51:16 -Error with certificate at depth: 1
03/01/24 04:51:16 ÂÂissuer ÂÂ= /O=condor/CN=hep.tau.ac.il
03/01/24 04:51:16 ÂÂsubject Â= /O=condor/CN=hep.tau.ac.il
03/01/24 04:51:16 ÂÂerr 19:self signed certificate in certificate chain
03/01/24 04:51:16 Tried to connect: -1
03/01/24 04:51:16 SSL: library failure: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed
03/01/24 04:51:16 Round 3.
03/01/24 04:51:16 Send message (3).
03/01/24 04:51:16 Status (c: 3, s: 2)
03/01/24 04:51:16 SSL Auth: SSL Authentication failed
03/01/24 04:51:16 AUTHENTICATE: method 256 (SSL) failed.
03/01/24 04:51:16 AUTHENTICATE: can still try these methods: Â
03/01/24 04:51:16 HANDSHAKE: in handshake(my_methods = '')
03/01/24 04:51:16 HANDSHAKE: handshake() - i am the client
03/01/24 04:51:16 HANDSHAKE: sending (methods == 0) to server
03/01/24 04:51:16 HANDSHAKE: server replied (method = 0)
03/01/24 04:51:16 AUTHENTICATE: no available authentication methods succeeded!
03/01/24 04:51:16 SECMAN: required authentication with collector at <192.114.100.129:9618> failed, so aborting command QUERY_SCHEDD_ADS.
03/01/24 04:51:16 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using SSL|AUTHENTICATE:1004:Failed to authenticate using SCITOKENS|AUTHENTICATE:1004:Failed to authenticate using
IDTOKENS|AUTHENTICATE:1004:Failed to authenticate using FS
Error: Couldn't contact the condor_collector on tau-cm.hep.tau.ac.il:9618. Â

Extra Info: the condor_collector is a process that runs on the central Â
manager of your Condor pool and collects the status of all the machines and Â
jobs in the Condor pool. The condor_collector might not be running, it might Â
be refusing to communicate with you, there might be a network problem, or Â
there may be some other problem. Check with your system administrator to fix Â
this problem. Â

If you are the system administrator, check that the condor_collector is Â
running on tau-cm.hep.tau.ac.il:9618, check the ALLOW/DENY configuration in Â
your condor_config, and check the MasterLog and CollectorLog files in your Â
log directory for possible clues as to why the condor_collector is not Â
responding. Also see the Troubleshooting section of the manual.

On Thu, Feb 29, 2024 at 10:07âPM Jaime Frey via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
Can you try running this command:
condor_ce_q -pool tau-cm.hep.tau.ac.il:9618 -name tau-htc.hep.tau.ac.il -d:D_SECURITY:2

This does the same query thatâs failing for the job router and should fail in the same way, with extra details.

Â- Jaime

On Feb 26, 2024, at 1:14âAM, David Cohen <cdavid@xxxxxxxxxxxxxxxxxxxxxx> wrote:

Hi,
Last week the HTCondor was upgraded from 8.8 to 10.9 and HTCondor-CE from 5 to 6.
Since then I see in the CE /var/log/condor-ce/JobRouterLog:
2/26/24 09:05:22 Unable to find address of tau-htc.hep.tau.ac.il at tau-cm.hep.tau.ac.il:9618
02/26/24 09:05:22 JobRouter (src="" failed to remove dest job: Unable to find address of tau-htc.hep.tau.ac.il at tau-cm.hep.tau.ac.il:9618
02/26/24 09:05:22 JobRouter (src="" removing orphaned destination job with no matching source job.
02/26/24 09:05:22 SECMAN: required authentication with collector at <192.114.100.129:9618> failed, so aborting command QUERY_SCHEDD_ADS.
02/26/24 09:05:22 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using SSL|AUTHENTICATE:1004:Failed to authenticate using SCITOKENS|AUTHENTICATE:1004:Failed to authenticate using
IDTOKENS|AUTHENTICATE:1004:Failed to authenticate using FS
02/26/24 09:05:22 Unable to find address of tau-htc.hep.tau.ac.il at tau-cm.hep.tau.ac.il:9618
02/26/24 09:05:22 JobRouter (src="" failed to remove dest job: Unable to find address of tau-htc.hep.tau.ac.il at tau-cm.hep.tau.ac.il:9618
02/26/24 09:05:22 JobRouter (src="" removing orphaned destination job with no matching source job.

And on the Central manager /var/log/condor/CollectorLog:
02/26/24 09:10:18 DC_AUTHENTICATE: required authentication of 192.114.100.130 failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using SSL|AUTHENTICATE:1004:Failed to authenticate us
ing SCITOKENS|AUTHENTICATE:1004:Failed to authenticate using IDTOKENS|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXX8hoSrF)
02/26/24 09:10:18 DC_AUTHENTICATE: required authentication of 192.114.100.130 failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using SSL|AUTHENTICATE:1004:Failed to authenticate us
ing SCITOKENS|AUTHENTICATE:1004:Failed to authenticate using IDTOKENS|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXXJf0649)

Naturally no grid jobs are running and the cluster is idle.
Any ideas on what went wrong?

Thanks,
David


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/