[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Cannot sent jobs as Owner in WindowsOS



Thats the startlog from the execute machine i would like to add. (Aherdskbkd05)

Gesendet mit der GMX iPhone App

Am 16.10.18 um 17:18 schrieb John M Knoeller

> what log are you showing me?
> 
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of rb
> Sent: Tuesday, October 16, 2018 9:31 AM
> To: htcondor-users@xxxxxxxxxxx
> Cc: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: [HTCondor-users] Cannot sent jobs as Owner in WindowsOS
> 
> 
> Hello TJ,
> I guess i need your help once more on this topic.
> After making sure all machines having condor verison 8.6.12 and the poolpassword was set correctly it was working on One submitter, one node and the pool.
> 
> However, I would like to add now additional nodes to this configuration.
> I did the same as with my existing node (aherdskbld04). Used the same condor_config file, set the PW, set the poolPW.
> But still its not working
> When looking now into the logs i see on my "new" execution node aherdskbld05 this:
> 
> 
> [...]
> 10/16/18 16:05:06 SECMAN: required authentication with credd ahersrvbld28.lgs-net.com failed, so aborting command CREDD_NOP.
> 10/16/18 16:05:06 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
> 10/16/18 16:10:06 SECMAN: required authentication with credd ahersrvbld28.lgs-net.com failed, so aborting command CREDD_NOP.
> 10/16/18 16:10:06 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
> 10/16/18 16:20:06 SECMAN: required authentication with credd ahersrvbld28.lgs-net.com failed, so aborting command CREDD_NOP.
> 10/16/18 16:20:06 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
> 
> 
> BUT: I am 100% sure to have set the PW correctly.
> 
> Can you give me a hint where ot search next?
> 
> Best regards,
> Robert
> 
> 
> 
> 
> -----------------------
> 
> -----------------------
> 
> 
> Gesendet: Mittwoch, 10. Oktober 2018 um 17:28 Uhr
> Von: "John M Knoeller" <johnkn@xxxxxxxxxxx<mailto:johnkn@xxxxxxxxxxx>>
> An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
> Betreff: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
> Ok, so this is not an ALLOW_WRITE issue in the CREDD or we would not have gotten this far.
> 
> 10/10/18 08:59:25 Calling HandleReq <store_cred_handler> (0) for command 479 (STORE_CRED) from calibration@lgs-net<mailto:calibration@lgs-net> <194.11.95.204:59824>
> 
> But the next message is this
> 
> 10/10/18 09:00:30 store_cred: Failed to send/recv user.
> 10/10/18 09:00:30 store_cred: code_store_cred failed.
> 
> Which indicates that the the CREDD was unable to read the username off of the wire.   That would be something outside of HTCondor - some sort of firewall or antivirus or something interfering with the communication on the wire.
> 
> Otherwise, the only explanation I can think of would be a version mismatch between the CREDD and the execute node.  Are both the CREDD and the execute node a version of HTCondor before 8.5.8 or after.   In 8.5.8 we changed the STORE_CRED command a bit, and that might be causing an issue here.
> 
> -tj
> 
> 
> 
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx<mailto:htcondor-users-bounces@xxxxxxxxxxx>> On Behalf Of rb
> Sent: Wednesday, October 10, 2018 4:36 AM
> To: htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>
> Cc: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
> Subject: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
> 
> Hello TJ,
> this is what i see in the CREDD Log
> (first I entered the PW on the submitter aherdskbld03 with IP 194.11.95.204 then on the execute node aherdskbld04 with IP 194.11.95.205)
> 
> 10/10/18 08:59:25 Calling HandleReq <store_cred_handler> (0) for command 479 (STORE_CRED) from calibration@lgs-net<mailto:calibration@lgs-net> <194.11.95.204:59824>
> 10/10/18 08:59:25 Return from HandleReq <store_cred_handler> (handler: 0.056644s, sec: 0.000s, payload: 0.000s)
> 10/10/18 09:00:30 Calling Handler <DaemonCommandProtocol::WaitForSocketData> (2)
> 10/10/18 09:00:30 Calling HandleReq <store_cred_handler> (0) for command 479 (STORE_CRED) from calibration@lgs-net<mailto:calibration@lgs-net> <194.11.95.205:62489>
> 10/10/18 09:00:30 store_cred: Failed to send/recv user.
> 10/10/18 09:00:30 store_cred: code_store_cred failed.
> 10/10/18 09:00:30 Return from HandleReq <store_cred_handler> (handler: 0.000408s, sec: 0.031s, payload: 0.000s)
> 10/10/18 09:00:30 Return from Handler <DaemonCommandProtocol::WaitForSocketData> 0.026974s
> 
> 
> Best regards,
> Robert
> 
> 
> 
> 
> 
> -----------------------
> 
> -----------------------
> 
> 
> Gesendet: Dienstag, 09. Oktober 2018 um 18:03 Uhr
> Von: "John M Knoeller" <johnkn@xxxxxxxxxxx<mailto:johnkn@xxxxxxxxxxx>>
> An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
> Betreff: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
> In order for condor_store_cred to store a password, it must send a command to a daemon.  For the pool password, it uses the condor_master daemon.  But for a user password, it must be able to contact either a condor_schedd or condor_credd daemon from that machine.
> 
> So on an execute node that does not have a SCHEDD running, it would be normal be able to use condor_store_cred to store the pool password, but not a user password unless the execute node is configured to use a CREDD.
> 
> So the problem must be that the CREDD is not responding to this host.  And this message
> 
> 10/08/18 17:21:03 store_cred: failed to recv answer.
> Operation failed.
>     Make sure your ALLOW_WRITE setting includes this host.
> 
> seems to back that up.
> 
> What does the CreddLog show at the time when you tried to run condor_store_cred on the execute node?
> 
> -tj
> 
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx<mailto:htcondor-users-bounces@xxxxxxxxxxx>> On Behalf Of rb
> Sent: Monday, October 8, 2018 10:42 AM
> To: htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>
> Cc: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
> Subject: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
> 
> Hello TJ
> Yes, CREDD is running on the Pool machine (ahersrvbld28).
> Not on this node, though.
> As I wrote before I was able to specify a Pool Password on Node, Pool and submitter.
> But did not manage to specify a PW for calibration@LGS-NET<mailto:calibration@LGS-NET> on aherdskbld04.
> 
> 
> This is the Condor_config from the node (aherdskbld04)
> 
> CONDOR_HOST = 194.11.95.125
> UID_DOMAIN = lgs-net.com
> CONDOR_ADMIN = calibration@LGS-NET<mailto:calibration@LGS-NET>
> SMTP_SERVER =
> ALLOW_READ = *
> ALLOW_WRITE = *
> ALLOW_ADMINISTRATOR = *
> JAVA = C:\PROGRA~1\Java\JRE18~2.0_1\bin\java.exe
> use POLICY : ALWAYS_RUN_JOBS
> WANT_VACATE = FALSE
> WANT_SUSPEND = TRUE
> DAEMON_LIST = MASTER STARTD
> NUM_SLOTS = $(detected_Memory)/16000
> FILESYSTEM_DOMAIN = lgs-net.com
> TRUST_UID_DOMAIN = true
> SOFT_UID_DOMAIN = true
> 
> STARTER_ALLOW_RUNAS_OWNER = true
> CREDD_HOST = AHERSRVBLD28.lgs-net.com
> CREDD_CACHE_LOCALLY = True
> 
> ALLOW_CONFIG = *
> SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
> SEC_CONFIG_NEGOTIATION = REQUIRED
> SEC_CONFIG_AUTHENTICATION = REQUIRED
> SEC_CONFIG_ENCRYPTION = REQUIRED
> SEC_CONFIG_INTEGRITY = REQUIRED
> this is the condor config for the pool-master (ahersrvbld28)
> 
> CONDOR_HOST = 194.11.95.125
> COLLECTOR_NAME = HxMap_IT
> UID_DOMAIN = lgs-net.com
> CONDOR_ADMIN = Calibration@xxxxxxxxxxx<mailto:Calibration@xxxxxxxxxxx>
> SMTP_SERVER =
> ALLOW_READ = *
> ALLOW_WRITE = *
> ALLOW_ADMINISTRATOR = *
> START = FALSE
> WANT_VACATE = FALSE
> WANT_SUSPEND = TRUE
> DAEMON_LIST = MASTER SCHEDD COLLECTOR NEGOTIATOR CREDD
> NUM_SLOTS_Type1 = 1
> 
> FILESYSTEM_DOMAIN = lgs-net.com
> TRUST_UID_DOMAIN = true
> SOFT_UID_DOMAIN = true
> 
> STARTER_ALLOW_RUNAS_OWNER = true
> CREDD_HOST = ahersrvbld28.lgs-net.com
> CREDD_CACHE_LOCALLY = True
> 
> SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
> 
> ALLOW_CONFIG = *
> SEC_CONFIG_NEGOTIATION = REQUIRED
> SEC_CONFIG_AUTHENTICATION = REQUIRED
> SEC_CONFIG_ENCRYPTION = REQUIRED
> SEC_CONFIG_INTEGRITY = REQUIRED
> 
> CREDD_LOG = $(LOG)/CreddLog
> CREDD_DEBUG = D_COMMAND
> MAX_CREDD_LOG = 50000000
> 
> 
> 
> 
> 
> This is what I get with your suggestion:
> 
> C:\Users\calibration>condor_store_cred -debug add
> Account: calibration@LGS-NET<mailto:calibration@LGS-NET>
> 
> Enter password:
> 
> 10/08/18 17:21:03 STORE_CRED: In mode 'add'
> 10/08/18 17:21:03 ZKM: First potential block in store_cred, DC==0
> 10/08/18 17:21:03 store_cred: failed to recv answer.
> Operation failed.
>     Make sure your ALLOW_WRITE setting includes this host.
> 
> 
> 
> 
> -----------------------
> 
> -----------------------
> 
> 
> Gesendet: Montag, 08. Oktober 2018 um 16:35 Uhr
> Von: "John M Knoeller" <johnkn@xxxxxxxxxxx<mailto:johnkn@xxxxxxxxxxx>>
> An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
> Betreff: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
> yes.
> 
> 10/08/18 11:47:27 (pid:3112) ERROR: Could not locate valid credential for user 'calibration@LGS-NET'<mailto:'calibration@LGS-NET'>
> 
> is definitely a problem.  If you are using a CREDD, then we need to look at the credd configuration for this node, and possibly the ALLOW_* permissions in the creddâs configuration.
> 
> If you are not using a credd, then you need to run this command on the execute node
> 
> condor_store_cred -debug -add -u calibration@LGS-NET
> 
> The -debug options is so in case it fails, we get additional error messages
> 
> Alternatlive, you could login to the execute node as calibration@LGS-NET
> and then just run
> 
> condor_store_cred -debug -add
> 
> -tj
> 
> 
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx<mailto:htcondor-users-bounces@xxxxxxxxxxx>> On Behalf Of rb
> Sent: Monday, October 8, 2018 6:48 AM
> To: htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>
> Cc: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
> Subject: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
> 
> Hi tj,
> I just ran another job, so timing is not corresponding to your request. BUT, it is always the same entries, so you would have gotten the same on 05:33.
> 
> StartLog:
> 10/08/18 11:47:27 slot1: Request accepted.
> 10/08/18 11:47:27 WARNING: forward resolution of ahercaxhdx32.lgs-net.com doesn't match 194.11.95.204!
> 10/08/18 11:47:27 slot1: Remote owner is calibration@xxxxxxxxxxx<mailto:calibration@xxxxxxxxxxx>
> 10/08/18 11:47:27 slot1: State change: claiming protocol successful
> 10/08/18 11:47:27 slot1: Changing state: Unclaimed -> Claimed
> 10/08/18 11:47:27 slot1: Got activate_claim request from shadow (194.11.95.204)
> 10/08/18 11:47:27 slot1: Remote job ID is 7213.0
> 10/08/18 11:47:27 slot1: Got universe "VANILLA" (5) from request classad
> 10/08/18 11:47:27 slot1: State change: claim-activation protocol successful
> 10/08/18 11:47:27 slot1: Changing activity: Idle -> Busy
> 10/08/18 11:47:27 condor_read() failed: recv(fd=1808) returned -1, errno = 10054 , reading 5 bytes from <127.0.0.1:62227>.
> 10/08/18 11:47:27 IO: Failed to read packet header
> 10/08/18 11:47:27 Starter pid 3112 exited with status 1
> 10/08/18 11:47:27 slot1: State change: starter exited
> 10/08/18 11:47:27 slot1: Changing activity: Busy -> Idle
> 10/08/18 11:47:27 Aborting CA_LOCATE_STARTER
> 10/08/18 11:47:27 ClaimId (<194.11.95.205:9618>#1538471621#8118#[Encryption="NO";Integrity="NO";CryptoMethods="3DES";]f68f93d04b3e507e18ee0978b7b4ad0c2a1b58e7) and GlobalJobId ( AHERDSKBLD03.lgs-net.com#7213.0#1538984881 ) not found
> 10/08/18 11:47:27 slot1: State change: received RELEASE_CLAIM command
> 10/08/18 11:47:27 slot1: Changing state and activity: Claimed/Idle -> Preempting/Vacating
> 10/08/18 11:47:27 slot1: State change: No preempting claim, returning to owner
> 10/08/18 11:47:27 slot1: Changing state and activity: Preempting/Vacating -> Owner/Idle
> 10/08/18 11:47:27 slot1: State change: IS_OWNER is false
> 10/08/18 11:47:27 slot1: Changing state: Owner -> Unclaimed
> 
> StarterLog.slot1
> 10/08/18 11:47:27 (pid:3112) ******************************************************
> 10/08/18 11:47:27 (pid:3112) ** condor_starter (CONDOR_STARTER) STARTING UP
> 10/08/18 11:47:27 (pid:3112) ** C:\condor\bin\condor_starter.exe
> 10/08/18 11:47:27 (pid:3112) ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1)
> 10/08/18 11:47:27 (pid:3112) ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
> 10/08/18 11:47:27 (pid:3112) ** $CondorVersion: 8.6.10 Mar 12 2018 BuildID: 435200 $
> 10/08/18 11:47:27 (pid:3112) ** $CondorPlatform: x86_64_Windows10 $
> 10/08/18 11:47:27 (pid:3112) ** PID = 3112
> 10/08/18 11:47:27 (pid:3112) ** Log last touched 10/8 11:46:29
> 10/08/18 11:47:27 (pid:3112) ******************************************************
> 10/08/18 11:47:27 (pid:3112) Using config source: C:\condor\condor_config
> 10/08/18 11:47:27 (pid:3112) Using local config sources:
> 10/08/18 11:47:27 (pid:3112)    C:\condor\condor_config.local
> 10/08/18 11:47:27 (pid:3112) config Macros = 67, Sorted = 66, StringBytes = 1547, TablesBytes = 2460
> 10/08/18 11:47:27 (pid:3112) CLASSAD_CACHING is OFF
> 10/08/18 11:47:27 (pid:3112) Daemon Log is logging: D_ALWAYS D_ERROR
> 10/08/18 11:47:27 (pid:3112) SharedPortEndpoint: listener already created.
> 10/08/18 11:47:27 (pid:3112) DaemonCore: command socket at <194.11.95.205:9618?addrs=194.11.95.205-9618&noUDP&sock=12572_410a_4061>
> 10/08/18 11:47:27 (pid:3112) DaemonCore: private command socket at <194.11.95.205:9618?addrs=194.11.95.205-9618&noUDP&sock=12572_410a_4061>
> 10/08/18 11:47:27 (pid:3112) GLEXEC_JOB not supported on this platform; ignoring
> 10/08/18 11:47:27 (pid:3112) Communicating with shadow <194.11.95.204:52107?addrs=194.11.95.204-52107>
> 10/08/18 11:47:27 (pid:3112) Submitting machine is "194.11.95.204"
> 10/08/18 11:47:27 (pid:3112) setting the orig job name in starter
> 10/08/18 11:47:27 (pid:3112) setting the orig job iwd in starter
> 10/08/18 11:47:27 (pid:3112) condor_read() failed: recv(fd=852) returned -1, errno = 10054 , reading 21 bytes from credd ahersrvbld28.lgs-net.com.
> 10/08/18 11:47:27 (pid:3112) IO: Failed to read packet header
> 10/08/18 11:47:27 (pid:3112) ERROR: Could not locate valid credential for user 'calibration@LGS-NET'<mailto:'calibration@LGS-NET'>
> 10/08/18 11:47:27 (pid:3112) Could not initialize user_priv as "LGS-NET\calibration".
>  Make sure this account's password is securely stored with condor_store_cred.
> 10/08/18 11:47:27 (pid:3112) ERROR: Failed to determine what user to run this job as, aborting
> 10/08/18 11:47:27 (pid:3112) Failed to initialize JobInfoCommunicator, aborting
> 10/08/18 11:47:27 (pid:3112) Unable to start job.
> 10/08/18 11:47:27 (pid:3112) SharedPortEndpoint: Destructor: Problem in thread shutdown notification: 0
> 10/08/18 11:47:27 (pid:3112) **** condor_starter (condor_STARTER) pid 3112 EXITING WITH STATUS 1
> 
> Starterlog
> (has only entries from 5 days ago)
> 
> 
> 
> 
> My observations:
> --> tried to add credentials on my processing node aherdskbld04, but failed.
> --> it was possible to add credentials on submitter and pool, but not on the node
> --> it was possible to add a pool PW on all machines.
> 
> 
> Best regards,
> Robert
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -----------------------
> 
> -----------------------
> 
> 
> Gesendet: Freitag, 05. Oktober 2018 um 18:19 Uhr
> Von: "John M Knoeller" <johnkn@xxxxxxxxxxx<mailto:johnkn@xxxxxxxxxxx>>
> An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
> Betreff: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
> what does the StartLog and the StarterLog and StarterLog.slot1 on AHERDSKBLD04.lgs-net.com say at time 05:33 ? (actually best to look at the time of the *first* disconnection)
> 
> The messages you see in the jobâs log file indicate that the job did match and at least attempt to start, but that something went wrong.   This could be a HTCondor configuration issue, or a problem with your firewall, or some problem with starting the job itself on that machine.  The StartLog or StarterLog or StarterLog.slot1 will give a clearer indication of what the problem is.
> 
> -tj
> 
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx<mailto:htcondor-users-bounces@xxxxxxxxxxx>> On Behalf Of rb
> Sent: Friday, October 5, 2018 2:24 AM
> To: htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>
> Cc: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
> Subject: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
> 
> Hello TJ,
> thanks for the response.
> I tried both:
> Deleting the entry "load_profile = True" and setting it to "load_profile = False"
> Neither did help.
> Here is an extract from my Submission:
> 
> Universe = vanilla
> Notification = Error
> Notify_user = user@xxxxxxxxxxx<mailto:user@xxxxxxxxxxx>
> 
> # OS requirements
> Requirements = ( (OpSys == "WINNT51" || OpSys == "WINNT52" || OpSys == "WINNT60" || OpSys == "WINNT61") || ((OpSys == "WINDOWS" || OpSys == "LINUX") && Arch == "X86_64") )
> Rank = kflops + memory*1024 - (Machine =?= LastRemoteHost)*500000
> 
> # Be sure to copy files back and forth to the node (linux disables this by default)
> should_transfer_files = YES
> when_to_transfer_output = ON_EXIT
> 
> 
> RunAsOwner = true
> load_profile = False
> 
> Executable = hxmap_condor_runner_$$(OpSys)_$$(Arch).bat
> Output = 181005053650_20180820084007_ingest____________create_.out
> Log = 181005053650_20180820084007_ingest____________create_.log
> Error = 181005053650_20180820084007_ingest____________create_.err
> 
> 
> This is again what i get when running
> C:\Users\calibration>Condor_status -af:h Name OpSys Arch LocalCredd HasWindowsRunAsOwner
> 
> 
> Name                     OpSys   Arch   LocalCredd HasWindowsRunAsOwner
> AHERDSKBLD02.lgs-net.com WINDOWS X86_64 undefined  true
> AHERDSKBLD03.lgs-net.com WINDOWS X86_64 AHERSRVBLD28.lgs-net.com true
> slot1@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot1@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 AHERSRVBLD28.lgs-net.com true
> slot1@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot1@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> slot1@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot1@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> slot1@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot1@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> slot2@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot2@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 AHERSRVBLD28.lgs-net.com true
> slot2@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot2@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> slot2@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot2@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> slot2@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot2@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> slot3@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot3@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 AHERSRVBLD28.lgs-net.com true
> slot3@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot3@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> slot3@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot3@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> slot3@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot3@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> slot4@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot4@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 AHERSRVBLD28.lgs-net.com true
> slot4@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot4@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> slot4@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot4@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> slot4@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot4@xxxxxxxxxxxxxxxxxxxxxxxx> WINDOWS X86_64 undefined                true
> 
> 
> Remark: only aherdskbld04.lgs-net.com is configured to run jobs as owner.
> 
> I can observe that this machine is selected by schedd but not send to the machine.
> Here is an extract from the log (the same entry repeats endless in the log)
> 
> 022 (7209.000.000) 10/05 05:33:14 Job disconnected, attempting to reconnect
>     Socket between submit and execute hosts closed unexpectedly
>     Trying to reconnect to slot1@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot1@xxxxxxxxxxxxxxxxxxxxxxxx> <194.11.95.205:9618?addrs=194.11.95.205-9618&noUDP&sock=12560_40bc_3>
> ...
> 024 (7209.000.000) 10/05 05:33:14 Job reconnection failed
>     Job not found at execution machine
>     Can not reconnect to slot1@xxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot1@xxxxxxxxxxxxxxxxxxxxxxxx>, rescheduling job
> 
> 
> NOt sure what I set not correct. Must be a small setting somewhere I am missing....
> 
> Best regards,
> Robert
> 
> 
> -----------------------
> 
> -----------------------
> 
> 
> > Gesendet: Mittwoch, 03. Oktober 2018 um 20:53 Uhr
> > Von: "John M Knoeller" <johnkn@xxxxxxxxxxx<mailto:johnkn@xxxxxxxxxxx>>
> > An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
> > Betreff: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
> >
> > either run_as_owner or RunAsOwner will work. and yes, load_profile conflicts with run_as_owner.
> > you must set one or the other but you cannot set both.
> >
> > -tj
> >
> > -----Original Message-----
> > From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx<mailto:htcondor-users-bounces@xxxxxxxxxxx>> On Behalf Of rb
> > Sent: Tuesday, October 2, 2018 5:00 AM
> > To: htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>
> > Subject: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
> >
> >
> > Hi TJ
> >
> > Sorry for the delay, I was on PTO the past couple of days.
> > To your question pls see attachment.
> > only machine AherSRVBLD28 (Pool), AherDSKBLD03 (submitter) and AHERDSKBLD04 (Node) was configured to run Jobs as Owner.
> >
> > a)
> > Do I need to specify in the Submission file
> > Run_As_owner or RunAsOwner?
> >
> > b)
> > by default we have
> > load_profile = True
> > in the submission file.
> > Is this a conflict to "Run_as_owner"
> >
> >
> > Best regards,
> > Robert
> >
> >
> >
> >
> > -----------------------
> >
> > -----------------------
> >
> >
> > > Gesendet: Donnerstag, 27. September 2018 um 23:49 Uhr
> > > Von: "John M Knoeller" <johnkn@xxxxxxxxxxx<mailto:johnkn@xxxxxxxxxxx>>
> > > An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
> > > Betreff: Re: [HTCondor-users] Fwd: Aw: Re: Cannot sent jobs as Owner in WindowsOS
> > >
> > > This part of the condor_q -analyze output
> > >
> > > 1 ( ( ( OpSys == "WINNT51" || OpSys == "WINNT52" || OpSys == "WINNT60" || OpSys == "WINNT61" ) || ( ( OpSys == "WINDOWS" || OpSys == "LINUX" ) && Arch == "X86_64" ) ) )
> > > 0 REMOVE
> > > 2 ( TARGET.HasWindowsRunAsOwner && ( TARGET.LocalCredd is "AHERSRVBLD28.lgs-net.com" )
> > >
> > >
> > > is saying that there are no machines in your pool that are ARCH == X86_64 and also support WindowsRunAsOwner and are using the necessary value for LocalCredd
> > >
> > >
> > > What Does
> > >
> > > condor_status -af:h Name OpSys Arch LocalCredd HasWindowsRunAsOwner
> > >
> > >
> > > show?
> > >
> > > -tj
> > >
> > >
> > >
> > >
> > > From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx<mailto:htcondor-users-bounces@xxxxxxxxxxx>> On Behalf Of rb
> > > Sent: Thursday, September 27, 2018 8:40 AM
> > > To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>
> > > Subject: [HTCondor-users] Fwd: Aw: Re: Cannot sent jobs as Owner in WindowsOS
> > >
> > >
> > > Von: rb
> > > Datum: 19. September 2018 um 11:02
> > > An: "Todd Tannenbaum"
> > > Betreff: Aw: Re: [HTCondor-users] Cannot sent jobs as Owner in WindowsOS
> > >
> > >
> > >
> > > Hello Todd,
> > >
> > > thanks for the additional hints.
> > > I was able to move a bit forward, but was not yet successful.
> > > Eg I was able to specify a condor-pool PW. Jobs are now picked up by condor, however non of them are picked by the nodes as it seems the requirements are not matching.
> > > (Remark: Jobs are matching and running when using the default temp user from condor)
> > >
> > >
> > > I attach the condor config files I created now. One for master, one submitter, one node.
> > > The submission files contain a line: "Run_as_owner = true"
> > >
> > > a) Basically I copied the content of the ..\etc\condor_config.local.credd into the condor config file of the pool manager running CREDD
> > > b) copied
> > > CREDD_HOST = credd.cs.wisc.edu
> > > CREDD_CACHE_LOCALLY = True
> > >
> > > STARTER_ALLOW_RUNAS_OWNER = True
> > >
> > > ALLOW_CONFIG = Administrator@*
> > > SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
> > > SEC_CONFIG_NEGOTIATION = REQUIRED
> > > SEC_CONFIG_AUTHENTICATION = REQUIRED
> > > SEC_CONFIG_ENCRYPTION = REQUIRED
> > > SEC_CONFIG_INTEGRITY = REQUIRED
> > > into all processing and submitter machines.
> > >
> > >
> > > When now running jobs they are stucked in the queue.
> > > Running condor_q -analyze is giving the following message:
> > >
> > > WARNING: Be advised:
> > > No resources matched request's constraints
> > > The Requirements expression for your job is:
> > > ( ( ( OpSys == "WINNT51" || OpSys == "WINNT52" || OpSys == "WINNT60" ||
> > > OpSys == "WINNT61" ) || ( ( OpSys == "WINDOWS" ||
> > > OpSys == "LINUX" ) && Arch == "X86_64" ) ) ) &&
> > > ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
> > > ( TARGET.HasFileTransfer ) && ( TARGET.HasWindowsRunAsOwner &&
> > > ( TARGET.LocalCredd is "AHERSRVBLD28.lgs-net.com" ) )
> > >
> > > Suggestions:
> > > Condition Machines Matched Suggestion
> > > --------- ---------------- ----------
> > > 1 ( ( ( OpSys == "WINNT51" || OpSys == "WINNT52" || OpSys == "WINNT60" || OpSys == "WINNT61" ) || ( ( OpSys == "WINDOWS" || OpSys == "LINUX" ) && Arch == "X86_64" ) ) )
> > > 0 REMOVE
> > > 2 ( TARGET.HasWindowsRunAsOwner && ( TARGET.LocalCredd is "AHERSRVBLD28.lgs-net.com" ) )
> > > 0 REMOVE
> > > 3 ( TARGET.Disk >= 3 ) 18
> > > 4 ( TARGET.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,0) )
> > > 18
> > > 5 ( TARGET.HasFileTransfer ) 18
> > > ---
> > > 7163.000: Request is running.
> > >
> > >
> > >
> > >
> > >
> > >
> > > Some questions:
> > >
> > > -Would this depend on the version of condor? I am running 8.4.10 on all machines?
> > >
> > > -My user is known in the domain. Would I need to add this user to the local users of each processing machine?
> > >
> > > -In the user manual in 7.2.5 "Condor_credd Daemon" a variable called "Local_credd" is mentioned. However I cannot find this variable in non of the examples. Is it necessary to specify this variable in the config file?
> > >
> > > - Do I need to use a pool PW? Or is it enought to use suggestion from "7.2.6 Executing Jobs with the User's Profile Loaded" and just set "load_profile = True" in submission file.
> > >
> > > - In usermanual 3.8.13.2 I find the following sentence: "Under Windows, HTCondor by default runs jobs under a dynamically created local account that exists for the duration of the job, but it can optionally run the job as the user account that owns the job if STARTER_ALLOW_RUNAS_OWNER is True and the job contains RunAsOwner=True."
> > > Is it RunAsOwner = true or Run_As_Owner = true?
> > >
> > >
> > > Btw:
> > > whoami is giving: calibration@xxxxxxxxxxx<mailto:calibration@xxxxxxxxxxx<mailto:calibration@xxxxxxxxxxx%3cmailto:calibration@xxxxxxxxxxx>>.
> > > This is correct. I would like to have this user running jobs in the condor environment.
> > >
> > >
> > > Best regards,
> > > Robert
> > >
> > >
> > >
> > > -----------------------
> > >
> > > -----------------------
> > >
> > >
> > > > Gesendet: Donnerstag, 13. September 2018 um 22:31 Uhr
> > > > Von: "Todd Tannenbaum" <tannenba@xxxxxxxxxxx<mailto:tannenba@xxxxxxxxxxx<mailto:tannenba@xxxxxxxxxxx%3cmailto:tannenba@xxxxxxxxxxx>>>
> > > > An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx%3cmailto:htcondor-users@xxxxxxxxxxx>>>, rb <robertbosch@xxxxxx<mailto:robertbosch@xxxxxx<mailto:robertbosch@xxxxxx%3cmailto:robertbosch@xxxxxx>>>
> > > > Betreff: Re: [HTCondor-users] Cannot sent jobs as Owner in WindowsOS
> > > >
> > > > On 9/12/2018 5:02 AM, rb wrote:
> > > > > I would like to send and process the job as "owner".
> > > > > Not the default "condor-slot user" is procesing the job, but actually the person who is logged on the submitter and is sending the job.
> > > > >
> > > > > For this we created a user "calibration*. This user is registered in our domain and has admin-permission on all machines (All win 10) connected to the pool.
> > > > >
> > > > > For this I edited the config file on Submitter and Executing nodes:
> > > > >
> > > > > [...]
> > > > > FILESYSTEM_DOMAIN = lgs-net.com
> > > > > UID_DOMAIN = lgs-net.com
> > > > > TRUST_UID_DOMAIN = true
> > > > > SOFT_UID_DOMAIN = true
> > > > > STARTER_ALLOW_RUNAS_OWNER = true
> > > > > [...]
> > > > >
> > > > >
> > > > > The submission files are having in addition following entry
> > > > > [...]
> > > > > Run_As_Owner = true
> > > > > [...]
> > > > >
> > > > >
> > > > > I also used "condor_store_cred add" on submitter and pool to store PW for user "calibration"
> > > > >
> > > > > Still its not working!
> > > > > Jobs are created. Also .err and .out files. But they are not picked by Scheduler. Using "condor_q": No jobs in queue.
> > > > >
> > > > >
> > > > > Can someone give some hints?
> > > > >
> > > >
> > > > Did you do a condor_reconfig or restart HTCondor after changing the config settings on your execute and submit hosts?
> > > >
> > > > Also I don't see anything in your config re your CREDD_HOST etc, as described in the Microsoft Windows chapter in the HTCondor Manual for executing jobs as the Submitting User... specifically I am looking at this section:
> > > > http://htcondor.org/manual/v8.7/MicrosoftWindows.html#x75-5750008.2.4
> > > > Perhaps you want to re-read and follow the configuration examples in that part of the Manual.
> > > >
> > > > Some additional ideas / suggestions:
> > > >
> > > > Are you running condor_submit as user "calibration" ? What does "whoami" report before submitting the job?
> > > >
> > > > Try submitting a very simple job and see if that runs as user "calibration". I would suggest running "whoami.exe" with a job event log and see what happens. For example --
> > > > executable = whoami.exe
> > > > output = test.out
> > > > error = test.err
> > > > log = test.log
> > > > run_as_owner = true
> > > > queue
> > > >
> > > > and then take a look at test.out, test.err, test.log.
> > > >
> > > > You say the job is successfully submitted but condor_q says no jobs in the queue... ??? what does "condor_q -allusers" say? Or is that because the job is quickly completing... what does condor_history say?
> > > >
> > > > Re the below observations: I am not the Windows expert, but I believe you should only need to run 'condor_store_cred add' on the submit node, which will then send the password (encrypted) and securely store it on the host running the condor_credd daemons. The execute node will securely fetch the password as needed.
> > > >
> > > > Hope the above helps,
> > > > Todd
> > > >
> > > >
> > > > > I made two observations:
> > > > > 1) I cannot use "condor_store_cred add" on executing machines. It returns an error "operation failed". Make sure you have WRITE permission onto this node. Although "WRITE = *" is set in all config files.
> > > > > 2) By default our Software adds "load_profile = true" in all submission files. Could this be a potential problem?
> > > > >
> > > > >
> > > > >
> > > > > Best regards,
> > > > > Robert
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > -----------------------
> > > > >
> > > > > -----------------------
> > > > >
> > > > > _______________________________________________
> > > > > HTCondor-users mailing list
> > > > > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx%3cmailto:htcondor-users-request@xxxxxxxxxxx>> with a
> > > > > subject: Unsubscribe
> > > > > You can also unsubscribe by visiting
> > > > > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > > > >
> > > > > The archives can be found at:
> > > > > https://lists.cs.wisc.edu/archive/htcondor-users/
> > > > >
> > > >
> > > >
> > > > --
> > > > Todd Tannenbaum <tannenba@xxxxxxxxxxx<mailto:tannenba@xxxxxxxxxxx<mailto:tannenba@xxxxxxxxxxx%3cmailto:tannenba@xxxxxxxxxxx>>> University of Wisconsin-Madison
> > > > Center for High Throughput Computing Department of Computer Sciences
> > > > HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
> > > > Phone: (608) 263-7132<tel:(608)%20263-7132> Madison, WI 53706-1685
> > > >
> > > _______________________________________________
> > > HTCondor-users mailing list
> > > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a
> > > subject: Unsubscribe
> > > You can also unsubscribe by visiting
> > > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > >
> > > The archives can be found at:
> > > https://lists.cs.wisc.edu/archive/htcondor-users/
> >
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
> >
> _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/