[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [SPAM] Re: job failed to submit to CE with SCIToken only



Hi Xiaowei,

it is probably a different issue as you have the scitoken sqlite file (which had been missing in our case).

Cheers,
  Thomas

On 14/06/2023 12.32, JIANG Xiaowei wrote:
Hi Thomas,

Thanks for you reply! I went through your issue and it seems quite similar.

The current scitoken packege in our CE is scitokens-cpp-1.0.1-1.el7.x86_64.

And the cache directory is only be found in /var/lib/condor:
/var/lib/condor/.cache:
drwx------ 2 condor condor 35 Jun 14 18:20 scitokens

Following your solution, do we need to create a cache directory under /var/lib/condor-ce/?
Hope to know the complete steps of how you were fixing the problem! Thanks!

Cheers,
Xiaowei



-----ååéä-----
åää: "Thomas Hartmann" <thomas.hartmann@xxxxxxx>
åéæé: 2023-06-14 18:15:30 (ææä)
æää: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>, "JIANG Xiaowei" <jiangxw@xxxxxxxxxx>
æé:
äé: [SPAM] Re: [HTCondor-users] job failed to submit to CE with SCIToken only

Hi Xiaowei,

on which versions are you with your scitokens package?
We had a similar(?) problem recently after the scitoken package got
updated, when due to our specific set up no sqlite db could be created
for token caching and failed somewhat silently
https://www-auth.cs.wisc.edu/lists/htcondor-users/2023-April/msg00055.shtml
but iirc the issue should have been resolved in a follow up release.

Cheers,
    Thomas


On 14/06/2023 10.05, JIANG Xiaowei wrote:
Hi Brian, Todd, Maarten,

Thanks to all of you! Following your suggestion, I did some test with scitoken on CERN's lxplus node.

Using a cms user's scitoken with scopes (compute.read), run the command Brain suggested and the submit command Maarten's suggested, got the same log:

06/14/23 09:22:35 SECMAN: received post-auth classad:
ReturnCode = "DENIED"
Sid = "condorce02:80306:1686727354:6858"
TriedAuthentication = true
User = "lhcb048@xxxxxxxxxxxxxxxxxx"
ValidCommands = "60007,457,60020,68,5,6,7,9,12,43,20,46,78,50,56,48,71,74"
06/14/23 09:22:35 SECMAN: FAILED: Received "DENIED" from server for user lhcb048@xxxxxxxxxxxxxxxxxx using method SCITOKENS.
Error: communication error
SECMAN:2010:Received "DENIED" from server for user lhcb048@xxxxxxxxxxxxxxxxxx using method SCITOKENS.
Error: Couldn't contact the condor_collector on condorce02.ihep.ac.cn

The CE looks like successfully recoganize the token and mapping to the local user lhcb048 (it has improved better than my test before). And the allow_* and deny_* on CE side are (some configurations are temporary for debuging the issue):
ALLOW_ADMIN_COMMANDS = true
ALLOW_ADMINISTRATOR = $(SUPERUSERS)
ALLOW_CLIENT = *
ALLOW_DAEMON = $(FRIENDLY_DAEMONS)
ALLOW_NEGOTIATOR = $(SUPERUSERS)
ALLOW_OWNER = $(SUPERUSERS)
ALLOW_READ = *
ALLOW_WRITE = *
COLLECTOR.ALLOW_ADVERTISE_MASTER = $(FRIENDLY_DAEMONS)
COLLECTOR.ALLOW_ADVERTISE_SCHEDD = $(FRIENDLY_DAEMONS)
COLLECTOR.ALLOW_ADVERTISE_STARTD = $(UNMAPPED_USERS), $(USERS)
COLLECTOR.ALLOW_READ = *
SCHEDD.ALLOW_NEGOTIATOR = condor@xxxxxxxxxxxxxxxxxxx/$(FULL_HOSTNAME)
SCHEDD.ALLOW_WRITE = *
SCHEDD_ALLOW_LATE_MATERIALIZE = true
DENY_ADMINISTRATOR = anonymous@*, unmapped@*
DENY_CLIENT = anonymous@*, unmapped@*
DENY_DAEMON = anonymous@*, unmapped@*
DENY_NEGOTIATOR = anonymous@*, unmapped@*
DENY_OWNER = anonymous@*, unmapped@*
DENY_WRITE = anonymous@*, unmapped@* */134.158.151.140 */31.147.202.178

I don't know if the log "SECMAN:2010:Received "DENIED" from server for user lhcb048@xxxxxxxxxxxxxxxxxx using method SCITOKENS" is related to my allow/deny policy or scitoken's scopes. Is it possible to fix the 'DENIED' problem on the CE side in this case?

Besides, I am asking the CMS friends to run the similar test on the ETF host.

Regards,
Xiaowei



-----ååéä-----
åää: "Bockelman, Brian" <BBockelman@xxxxxxxxxxxxx>
åéæé: 2023-06-14 09:47:47 (ææä)
æää: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
æé:
äé: Re: [HTCondor-users] job failed to submit to CE with SCIToken only

Hi Xiaowei,

  From the server-side logfile you share, the error is on the client side.  For both SSL/TLS and SCITOKENS authentication, the client sends a message that it's giving up prior to completing the SSL handshake.  Since it's that early, you can eliminate any current problems with the token itself or the authorization configuration.

I queried from a personal dev host and it seems to have given a reasonable response.

You may ask the administrator of etf-01.cern.ch to try sending you the output of the following:

_CONDOR_AUTH_SSL_CLIENT_CADIR=/etc/grid-security/certificates/ _CONDOR_SEC_CLIENT_AUTHENTICATION_METHODS=SCITOKENS _CONDOR_TOOL_DEBUG=D_SECURITY:2 condor_status -debug -pool condorce02.ihep.ac.cn:9619

and see if the client is producing more useful debug outputs at the higher logging level.

For example, if AUTH_SSL_CLIENT_CADIR is not set to /etc/grid-security/certificates (as suggested in Maarten's later link) then I can reproduce what you see rather easily.

Brian

On Jun 13, 2023, at 5:17 AM, JIANG Xiaowei <jiangxw@xxxxxxxxxx> wrote:

Dear Experts,

I am facing a wierd problem that the cms sam job can not be submitted to our CE with only SCIToken.
On sam schedd side, there are some errors like [1].
On my CE collector, the CollectorLog is posted in the attachment and no clue in SchedLog.
The related configurations are like:
[root@condorce02 config.d]# cat /etc/condor-ce/mapfiles.d/10-scitokens.conf
# CMS SAM ##
SCITOKENS /^https\:\/\/cms-auth\.web\.cern\.ch\/,08ca855e-d715-410e-a6ff-ad77306e1763$/ cmssgm006
## ATLAS SAM ##
SCITOKENS /^https:\/\/atlas-auth\.web\.cern\.ch\/,5c5d2a4d-9177-3efa-912f-1b4e5c9fb660$/ atlassgm007
[root@condorce02 config.d]# condor_ce_config_val -dump Collector.SEC
COLLECTOR.SEC_ADVERTISE_STARTD_AUTHENTICATION_METHODS = FS,TOKEN,SCITOKENS,GSI,SSL
COLLECTOR.SEC_READ_AUTHENTICATION_METHODS = FS,TOKEN,SCITOKENS,GSI,SSL
COLLECTOR.SEC_WRITE_AUTHENTICATION_METHODS = FS,TOKEN,SCITOKENS,GSI,SSL
The condor_versions are:
[root@condorce02 config.d]# condor_ce_version
$HTCondorCEVersion: 5.1.6 $
$CondorVersion: 9.0.17 May 27 2023 BuildID: 649540 PackageID: 9.0.17-3 $
Hope to get help from your expert side! Thanks!

Regards,
Xiaowei

[1] -
06/07/23 13:23:07 [117315] SECMAN: required authentication with collector at <202.122.33.23:9619> failed, so aborting command QUERY_SCHEDD_ADS. 06/07/23 13:23:07 [117315] ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using SSL|AUTHENTICATE:1004:Failed to authenticate using SCITOKENS|AUTHENTICATE:1004:Failed to authenticate using IDTOKENS|AUTHENTICATE:1004:Failed to authenticate using FS 06/07/23 13:23:07 [117315] Error locating schedd condorce02.ihep.ac.cn 06/07/23 13:23:07 [117315] Can't find address of queue manager 06/07/23 13:23:07 [117315] Error connecting to schedd condorce02.ihep.ac.cn: <collector.log>
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature