[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Setting Condor Job Owner in Windows



Todd,

I was using your suggestion to use:
	 condor_submit -n winxp-dev-01 condor_submit_file
to start jobs from my java app where they run as SYSTEM when examined via
condor_q.

However, the job itself starts another set of jobs that run as
'condor-reuse-vm2',
when examined with the Windows Task Manager.  I'm not sure why that is, 
but the bottom line is that I need those jobs to run as user 'diane'.

Therefore, I think need to get the +Owner = "diane" feature in my
condor_submit 
file working.  I'm hoping that will make the children of the job in the
queue get
run as 'diane' and not 'condor_reuse-vm2'.  Does anyone have info on that?

When I include the +Owner = statement there, it does start the job in the
queue as 'diane' 
but then just hangs and accumulates time in the queue.
The only indication of a problem are the condor logs included below.

Note that I have the QUEUE_ALL_USERS_TRUSTED = True
In my condor.config file.

Can you or anyone give me any more advice on how to get this to work?

Here are snippets of the condor logs:

SHADOWLOG:
10/17 15:48:39 ** condor_shadow (CONDOR_SHADOW) STARTING UP
10/17 15:48:39 ** C:\condor\bin\condor_shadow.exe
10/17 15:48:39 ** $CondorVersion: 6.8.5 May 17 2007 $
10/17 15:48:39 ** $CondorPlatform: INTEL-WINNT50 $
10/17 15:48:39 ** PID = 4852
10/17 15:48:39 ** Log last touched 10/17 15:41:06
10/17 15:48:39 ******************************************************
10/17 15:48:39 Using config source: C:\condor\condor_config
10/17 15:48:39 Using local config sources: 
10/17 15:48:39    C:\condor/condor_config.local
10/17 15:48:39 DaemonCore: Command Socket at <192.168.2.105:4273>
10/17 15:48:44 Initializing a VANILLA shadow for job 556.0
10/17 15:48:54 (556.0) (4852): attempt to connect to <192.168.2.105:9620>
failed: timed out after 10 seconds.
10/17 15:48:54 (556.0) (4852): ERROR: Could not locate valid credential for
user 'diane@NT AUTHORITY'
10/17 15:48:54 (556.0) (4852): init_user_ids() failed!


SCHEDLOG:
10/17 15:48:34 (pid:2208) DaemonCore: Command received via UDP from host
<192.168.2.105:4257>
10/17 15:48:34 (pid:2208) DaemonCore: received command 421 (RESCHEDULE),
calling handler (reschedule_negotiator)
10/17 15:48:34 (pid:2208) Sent ad to central manager for diane@winxp-dev-01
10/17 15:48:34 (pid:2208) Sent ad to 1 collectors for diane@winxp-dev-01
10/17 15:48:34 (pid:2208) Called reschedule_negotiator()
10/17 15:48:35 (pid:2208) Activity on stashed negotiator socket
10/17 15:48:35 (pid:2208) Negotiating for owner: diane@winxp-dev-01
10/17 15:48:35 (pid:2208) Checking consistency running and runnable jobs
10/17 15:48:35 (pid:2208) Tables are consistent
10/17 15:48:35 (pid:2208) Out of jobs - 1 jobs matched, 0 jobs idle, flock
level = 0
10/17 15:48:35 (pid:2208) Activity on stashed negotiator socket
10/17 15:48:35 (pid:2208) Negotiating for owner: diane@winxp-dev-01
10/17 15:48:35 (pid:2208) Checking consistency running and runnable jobs
10/17 15:48:35 (pid:2208) Tables are consistent
10/17 15:48:35 (pid:2208) Out of servers - 0 jobs matched, 1 jobs idle, 0
jobs rejected
10/17 15:48:39 (pid:2208) Sent ad to central manager for diane@winxp-dev-01
10/17 15:48:39 (pid:2208) Sent ad to 1 collectors for diane@winxp-dev-01
10/17 15:48:39 (pid:2208) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone
10/17 15:48:39 (pid:2208) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone
10/17 15:48:39 (pid:2208) Starting add_shadow_birthdate(556.0)
10/17 15:48:44 (pid:2208) Started shadow for job 556.0 on
"<192.168.2.105:1032>", (shadow pid = 4852)
10/17 15:48:44 (pid:2208) Sent ad to central manager for diane@winxp-dev-01
10/17 15:48:44 (pid:2208) Sent ad to 1 collectors for diane@winxp-dev-01
10/17 15:49:04 (pid:2208) DaemonCore: Command received via UDP from host
<192.168.2.105:4288>
10/17 15:49:04 (pid:2208) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
10/17 15:49:04 (pid:2208) Shadow pid 4852 for job 556.0 exited with status 4
10/17 15:49:04 (pid:2208) ERROR: Shadow exited with job exception code!
10/17 15:49:04 (pid:2208) ERROR: Shadow exited with job exception code!
10/17 15:49:06 (pid:2208) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone
10/17 15:49:06 (pid:2208) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone
10/17 15:49:06 (pid:2208) Starting add_shadow_birthdate(556.0)
10/17 15:49:10 (pid:2208) Started shadow for job 556.0 on
"<192.168.2.105:1032>", (shadow pid = 3492)
10/17 15:49:10 (pid:2208) Sent ad to central manager for diane@winxp-dev-01
10/17 15:49:10 (pid:2208) Sent ad to 1 collectors for diane@winxp-dev-01

COLLECTORLOG:
10/17 15:48:34 (Sending 1 ads in response to query)
10/17 15:48:34 (Sending 7 ads in response to query)
10/17 15:48:34 Got QUERY_STARTD_PVT_ADS
10/17 15:48:34 (Sending 2 ads in response to query)
10/17 15:48:49 NegotiatorAd  : Inserting ** "< winxp-dev-01 >"

STARTLOG:
10/17 15:48:35 DaemonCore: Command received via UDP from host
<192.168.2.105:4267>
10/17 15:48:35 DaemonCore: received command 440 (MATCH_INFO), calling
handler (command_match_info)
10/17 15:48:35 vm2: match_info called
10/17 15:48:35 vm2: Received match <192.168.2.105:1032>#1192108044#159#...
10/17 15:48:35 vm2: State change: match notification protocol successful
10/17 15:48:35 vm2: Changing state: Unclaimed -> Matched
10/17 15:48:35 DaemonCore: Command received via TCP from host
<192.168.2.105:4268>
10/17 15:48:35 DaemonCore: received command 442 (REQUEST_CLAIM), calling
handler (command_request_claim)
10/17 15:48:35 vm2: Request accepted.
10/17 15:48:35 vm2: Remote owner is diane@winxp-dev-01
10/17 15:48:35 vm2: State change: claiming protocol successful
10/17 15:48:35 vm2: Changing state: Matched -> Claimed
10/17 15:50:51 DaemonCore: Command received via UDP from host
<192.168.2.105:4325>
10/17 15:50:51 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_release_claim)
10/17 15:50:51 vm2: State change: received RELEASE_CLAIM command
10/17 15:50:51 vm2: Changing state and activity: Claimed/Idle ->
Preempting/Vacating
10/17 15:50:51 vm2: State change: No preempting claim, returning to owner
10/17 15:50:51 vm2: Changing state and activity: Preempting/Vacating ->
Owner/Idle
10/17 15:50:51 vm2: State change: IS_OWNER is false
10/17 15:50:51 vm2: Changing state: Owner -> Unclaimed
10/17 15:50:51 DaemonCore: Command received via UDP from host
<192.168.2.105:4326>
10/17 15:50:51 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_release_claim)
10/17 15:50:51 Warning: can't find resource with ClaimId
(<192.168.2.105:1032>#1192108044#159#...)

NEGOTIATORLOG:
10/17 15:48:34 ---------- Started Negotiation Cycle ----------
10/17 15:48:34 Phase 1:  Obtaining ads from collector ...
10/17 15:48:34   Getting all public ads ...
10/17 15:48:34   Sorting 7 ads ...
10/17 15:48:34   Getting startd private ads ...
10/17 15:48:34 Got ads: 7 public and 2 private
10/17 15:48:34 Public ads include 2 submitter, 2 startd
10/17 15:48:35 Phase 2:  Performing accounting ...
10/17 15:48:35 Phase 3:  Sorting submitter ads by priority ...
10/17 15:48:35 Phase 4.1:  Negotiating with schedds ...
10/17 15:48:35   Negotiating with diane@winxp-dev-01 at <192.168.2.105:1031>
10/17 15:48:35 0 seconds so far
10/17 15:48:35     Request 00556.00000:
10/17 15:48:35       Matched 556.0 diane@winxp-dev-01 <192.168.2.105:1031>
preempting none <192.168.2.105:1032> vm2@winxp-dev-01
10/17 15:48:35       Successfully matched with vm2@winxp-dev-01
10/17 15:48:35     Got NO_MORE_JOBS;  done negotiating
10/17 15:48:35 Phase 4.2:  Negotiating with schedds ...
10/17 15:48:35   Negotiating with diane@winxp-dev-01 at <192.168.2.105:1031>
10/17 15:48:35 0 seconds so far
10/17 15:48:35     Got NO_MORE_JOBS;  done negotiating
10/17 15:48:35 ---------- Finished Negotiation Cycle ----------

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Diane
Sent: Thursday, October 04, 2007 12:30 PM
To: 'Condor-Users Mail List'
Subject: Re: [Condor-users] Setting Condor Job Owner in Windows

Thanks Todd,

Your last suggestion finally worked! 
I used -n winxp-dev-01 option on the condor submit command and the job ran
(as SYSTEM).

Note that I did not use the setting +Owner = "diane"
because when I did use 'diane' it seemed to get hung checking things (with
run time increasing) and SchedLog contained:

10/4 12:10:22 (pid:2152) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone
10/4 12:10:22 (pid:2152) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone

I'll look at that some more but for now at least condor is working running
as SYSTEM.

Also, I tried to set the creds for SYSTEM with a dummy pw as you suggest
below (from a SYSTEM cmd window), but the condor_store_cred command failed
because it couldn't find the host,
even though HOSTALLOW_WRITE=*.  I believe it's looking for SYSTEM@NT
AUTHORITY not WINXP-DEV-01.  Here is the exchange:

C:\WINDOWS\system32>\condor\bin\condor_store_cred add
Account: SYSTEM@NT AUTHORITY

Enter password:

Operation failed.
    Make sure your HOSTALLOW_WRITE setting includes this host.

C:\WINDOWS\system32>

I'll look at the above some more too, but setting that may not be necessary.

Thanks so much again,
Diane


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Thursday, October 04, 2007 10:57 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Setting Condor Job Owner in Windows

Diane wrote:
> Hi Todd,
> 
> Your idea sounded great! However, I tried it without success (the job
still
> starts as SYSTEM even thought the condor.submit file say +Owner = "diane",
> and I had reconfigured condor to disable Queue access checks).  The job
> never gets into the queue and returns with condor.error:
> 
> ERROR: No credential stored for SYSTEM@NT AUTHORITY
> 
> 	Correct this by running:
> 	condor_store_cred add
> 
> In hopes of figuring this out, I have included here the relevant parts of
> the condor logs (in particular SchedLog showing queue access checks
> disabled), and my condor.submit file. 
> 
> If you have any insights that would be great.

Ok, the formula in my previous post tells you how to setup Condor so 
User A (in your case, SYSTEM) can submit a job as User B (diane).

What I failed to say is how to disable the (normally helpful) check that 
condor_submit makes to be certain a password is stored for the user 
running condor_submit.  After all, you don't care that SYSTEM does not 
have a password stored since the job will run as diane ... but 
condor_submit isn't smart enough to know that.

However, if you use the "-n <schedd-name>" argument to condor_submit, it 
will not do this "see if a password is stored" check.  So to get it to 
work, try

   condor_submit -n winxp-dev-01 Condor.submit

Another idea that may be even easier:  As user "SYSTEM", run
   condor_store_cred add
and just give it a bogus password.  Condor won't ever use it, but 
condor_submit will be happy when it looks to see that one is stored.
If you don't know how to open up a command window as user SYSTEM, see
   http://blogs.msdn.com/adioltean/articles/271063.aspx
which gives one way to do it (personally, i made a service that does it).

Good Luck!  Let me know how it goes...

If it helps, below is a screenshot of a successful test I did:


C:\temp\test>whoami
SYSTEM

C:\temp\test>hostname
tannenbaum-t23

C:\temp\test>condor_submit -n tannenbaum-t23 test.sub
Submitting job(s).
1 job(s) submitted to cluster 37.

C:\temp\test>condor_q

-- Submitter: tannenbaum-t23 : <127.0.0.1:1357> : tannenbaum-t23
  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   37.0   diane          10/4  15:41   0+00:00:00 H  0   9.8  test.sub

1 jobs; 0 idle, 0 running, 1 held

C:\temp\test>type test.sub
executable = test.sub
hold = true
+Owner = "diane"
universe = vanilla
queue


-- 
Todd Tannenbaum                       University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/