[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Intermittent submission failures to HTCondor-CE



Hello again,

We have finally foundÂthe error message in our CE:

09/16/22 05:53:41 (cid:630527) Command=QMGMT_WRITE_CMD, peer=<XXXXX:YYY>
09/16/22 05:53:41 (cid:630527) Authentication Failed, MethodsTried=FS,TOKEN,SCITOKENS,GSI,SSL
09/16/22 05:53:41 DC_AUTHENTICATE: authentication of <XXXXX:YYY> did not result in a valid mapped user name, which is required for this command (1112 QMGMT_WRITE_CMD), so aborting.
09/16/22 05:53:41 DC_AUTHENTICATE: reason for authentication failure: AUTHENTICATE:1006:exceeded 1663300416 deadline during authentication|AUTHENTICATE:1004:Failed to authenticate using IDTOKENS|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXXsvyHJB)

The "exceeded 1663300416 deadline during authentication", 1663300416 is 5 seconds before 09/16/22 05:53:41. Thus, I understand that the authentication took more than 5 seconds and then failed, right? This does not happen for our other CE (same version and configs); it just happensÂfrom time to time. Is there any way to increment the 5 seconds deadline? We use GSI authentication against an Argus, can this be related? Or with theÂGSS_ASSIST_GRIDMAP_CACHE_EXPIRATION?

Thank you again.

Best regards,

Carles


On Mon, 12 Sept 2022 at 16:38, Carles Acosta <cacosta@xxxxxx> wrote:
Dear all,

We have a strange issue regarding our HTCondor-CEs.Â

LHCb experiment is experiencing intermittent submission errors to our CE:

Pilot submission failed with error: ERROR: Failed to connect to queue manager ce14.pic.es
AUTHENTICATE:1005:Failed to securely exchange session key
AUTHENTICATE:1004:Failed to authenticate using IDTOKENS
AUTHENTICATE:1004:Failed to authenticate using FS Â

But otherÂtimes, everything works fine and the submissions are correct. Furthermore, there is another CE where everything is always ok and shares the same version and general configuration as ce14.pic.es. Both CEs are running condor 9.0.16 and HTCondor-CE version 5.1.5.Â

We do not see any error in the CE logs that explain this behavior. The experiment is authenticated correctly through GSI when the submission is ok in ce14 and always in the other CE. Any ideas? I really do not know how to debug this issue since I do not see any error in the CE log.

Thank you in advance.

Best regards,

Carles
--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
AvÃs - Aviso - Legal Notice: Âhttp://legal.ifae.es


--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
http://www.pic.esÂ
AvÃs - Aviso - Legal Notice: Âhttp://legal.ifae.es