[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] job exclusion on a node for a given queue



Hello,

I'm trying to answer to Atlas VO request to exclude submission job on node where the  architecture level is supported, i.e., x86_64-v2.
These workers, even if they are quite old, are still very useful to give resources to certain VOs.

Is it possible to simply exclude jobs of a VO from being submitted on certain worker nodes using Condor.

Best Regards

Jean-Claude





----- Mail original -----
De: "JIANG Xiaowei" <jiangxw@xxxxxxxxxx>
Ã: "htcondor-users mail list" <htcondor-users@xxxxxxxxxxx>
EnvoyÃ: Mercredi 14 Juin 2023 10:05:35
Objet: Re: [HTCondor-users] job failed to submit to CE with SCIToken only

Hi Brian, Todd, Maarten,

Thanks to all of you! Following your suggestion, I did some test with scitoken on CERN's lxplus node.

Using a cms user's scitoken with scopes (compute.read), run the command Brain suggested and the submit command Maarten's suggested, got the same log:

06/14/23 09:22:35 SECMAN: received post-auth classad:
ReturnCode = "DENIED"
Sid = "condorce02:80306:1686727354:6858"
TriedAuthentication = true
User = "lhcb048@xxxxxxxxxxxxxxxxxx"
ValidCommands = "60007,457,60020,68,5,6,7,9,12,43,20,46,78,50,56,48,71,74"
06/14/23 09:22:35 SECMAN: FAILED: Received "DENIED" from server for user lhcb048@xxxxxxxxxxxxxxxxxx using method SCITOKENS.
Error: communication error
SECMAN:2010:Received "DENIED" from server for user lhcb048@xxxxxxxxxxxxxxxxxx using method SCITOKENS.
Error: Couldn't contact the condor_collector on condorce02.ihep.ac.cn

The CE looks like successfully recoganize the token and mapping to the local user lhcb048 (it has improved better than my test before). And the allow_* and deny_* on CE side are (some configurations are temporary for debuging the issue):
ALLOW_ADMIN_COMMANDS = true
ALLOW_ADMINISTRATOR = $(SUPERUSERS)
ALLOW_CLIENT = *
ALLOW_DAEMON = $(FRIENDLY_DAEMONS)
ALLOW_NEGOTIATOR = $(SUPERUSERS)
ALLOW_OWNER = $(SUPERUSERS)
ALLOW_READ = *
ALLOW_WRITE = *
COLLECTOR.ALLOW_ADVERTISE_MASTER = $(FRIENDLY_DAEMONS)
COLLECTOR.ALLOW_ADVERTISE_SCHEDD = $(FRIENDLY_DAEMONS)
COLLECTOR.ALLOW_ADVERTISE_STARTD = $(UNMAPPED_USERS), $(USERS)
COLLECTOR.ALLOW_READ = *
SCHEDD.ALLOW_NEGOTIATOR = condor@xxxxxxxxxxxxxxxxxxx/$(FULL_HOSTNAME)
SCHEDD.ALLOW_WRITE = *
SCHEDD_ALLOW_LATE_MATERIALIZE = true
DENY_ADMINISTRATOR = anonymous@*, unmapped@*
DENY_CLIENT = anonymous@*, unmapped@*
DENY_DAEMON = anonymous@*, unmapped@*
DENY_NEGOTIATOR = anonymous@*, unmapped@*
DENY_OWNER = anonymous@*, unmapped@*
DENY_WRITE = anonymous@*, unmapped@* */134.158.151.140 */31.147.202.178

I don't know if the log "SECMAN:2010:Received "DENIED" from server for user lhcb048@xxxxxxxxxxxxxxxxxx using method SCITOKENS" is related to my allow/deny policy or scitoken's scopes. Is it possible to fix the 'DENIED' problem on the CE side in this case?

Besides, I am asking the CMS friends to run the similar test on the ETF host.

Regards,
Xiaowei



> -----ååéä-----
> åää: "Bockelman, Brian" <BBockelman@xxxxxxxxxxxxx>
> åéæé: 2023-06-14 09:47:47 (ææä)
> æää: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> æé: 
> äé: Re: [HTCondor-users] job failed to submit to CE with SCIToken only
> 
> Hi Xiaowei,
> 
> From the server-side logfile you share, the error is on the client side.  For both SSL/TLS and SCITOKENS authentication, the client sends a message that it's giving up prior to completing the SSL handshake.  Since it's that early, you can eliminate any current problems with the token itself or the authorization configuration.
> 
> I queried from a personal dev host and it seems to have given a reasonable response.
> 
> You may ask the administrator of etf-01.cern.ch to try sending you the output of the following:
> 
> _CONDOR_AUTH_SSL_CLIENT_CADIR=/etc/grid-security/certificates/ _CONDOR_SEC_CLIENT_AUTHENTICATION_METHODS=SCITOKENS _CONDOR_TOOL_DEBUG=D_SECURITY:2 condor_status -debug -pool condorce02.ihep.ac.cn:9619
> 
> and see if the client is producing more useful debug outputs at the higher logging level.
> 
> For example, if AUTH_SSL_CLIENT_CADIR is not set to /etc/grid-security/certificates (as suggested in Maarten's later link) then I can reproduce what you see rather easily.
> 
> Brian
> 
> > On Jun 13, 2023, at 5:17 AM, JIANG Xiaowei <jiangxw@xxxxxxxxxx> wrote:
> > 
> > Dear Experts, 
> > 
> > I am facing a wierd problem that the cms sam job can not be submitted to our CE with only SCIToken.  
> > On sam schedd side, there are some errors like [1]. 
> > On my CE collector, the CollectorLog is posted in the attachment and no clue in SchedLog. 
> > The related configurations are like: 
> > [root@condorce02 config.d]# cat /etc/condor-ce/mapfiles.d/10-scitokens.conf 
> > # CMS SAM ##
> > SCITOKENS /^https\:\/\/cms-auth\.web\.cern\.ch\/,08ca855e-d715-410e-a6ff-ad77306e1763$/ cmssgm006
> > ## ATLAS SAM ##
> > SCITOKENS /^https:\/\/atlas-auth\.web\.cern\.ch\/,5c5d2a4d-9177-3efa-912f-1b4e5c9fb660$/ atlassgm007
> > [root@condorce02 config.d]# condor_ce_config_val -dump Collector.SEC
> > COLLECTOR.SEC_ADVERTISE_STARTD_AUTHENTICATION_METHODS = FS,TOKEN,SCITOKENS,GSI,SSL
> > COLLECTOR.SEC_READ_AUTHENTICATION_METHODS = FS,TOKEN,SCITOKENS,GSI,SSL
> > COLLECTOR.SEC_WRITE_AUTHENTICATION_METHODS = FS,TOKEN,SCITOKENS,GSI,SSL
> > The condor_versions are:  
> > [root@condorce02 config.d]# condor_ce_version
> > $HTCondorCEVersion: 5.1.6 $
> > $CondorVersion: 9.0.17 May 27 2023 BuildID: 649540 PackageID: 9.0.17-3 $
> > Hope to get help from your expert side! Thanks! 
> > 
> > Regards, 
> > Xiaowei 
> > 
> > [1] -  
> > 06/07/23 13:23:07 [117315] SECMAN: required authentication with collector at <202.122.33.23:9619> failed, so aborting command QUERY_SCHEDD_ADS. 06/07/23 13:23:07 [117315] ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using SSL|AUTHENTICATE:1004:Failed to authenticate using SCITOKENS|AUTHENTICATE:1004:Failed to authenticate using IDTOKENS|AUTHENTICATE:1004:Failed to authenticate using FS 06/07/23 13:23:07 [117315] Error locating schedd condorce02.ihep.ac.cn 06/07/23 13:23:07 [117315] Can't find address of queue manager 06/07/23 13:23:07 [117315] Error connecting to schedd condorce02.ihep.ac.cn: <collector.log>
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > 
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/