[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] credd issues: heterogenous system MAC-central; WIN-execute + EC2 (win) when this works



If your VM session exists only to run jobs, have you tried setting your
START expression to TRUE?  

You should not need a credd unless you are running as owner, which is
not the default. 

Also your CRED_HOST *must be* a windows machine.  It may be too early in
the a.m., but I can't discern from the logs below if that is the case. 

Cheers,
Tim

On Thu, 2011-08-18 at 19:19 -0400, Jason Herman wrote:
> > hi-
> > 
> > Here are the machines i'm setting up:
> > 
> > 1) Mac (intel osx) - as condor central server
> > 2) paralles VM running Windows within the mac as execute machine
> > 3) seperate windows desktop
> > 4) after everthing else works: EC2 windows machines - i suppose
> > running as a cluster that attachs as a flock. (perhaps with
> > cyclecomputing)
> > 
> > I have tried (for days):
> > * playing with various configurations of condor_config &
> > condor_config.local on both machines.
> > * taken down firewalls on both sides.
> > * read manuals, googled, etc..
> > * running condor_store_cred with various setting on both sides
> > 
> > STATUS:
> > So far I have Condor up and running on the MAC as an execute,
> > submit, manage installation. I successfully ran a test job. The
> > windows execute node is up but i can't test it until i get credd
> > security working properly (i think that's the problem). I can see
> > the windows and mac slots from the both sides (see below). 
> > 
> > When i submit a job from MAC that has windows requirements it
> > doesn't run. Presently, condor_q -analyze says "not yet been
> > considered by the matchmaker" and "match but reject the job for
> > unknown reasons." Under a previously attempted configuration it was
> > "reject your job because of their own requirements" , the Windows
> > slot would got to 'Matched', but the job would be Idle and the logs
> > would suggest a security issue.
> > 
> > I can't even condor_rm the Idle jobs on the MAC side. I'm guessing
> > there being matched to Windows ceded their control:
> > ------
> > jimi:~ root# condor_q
> > 
> > 
> > -- Submitter: jimi.westell.com : <169.254.177.117:49371> :
> > jimi.westell.com
> > ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
> >               
> >  11.0   Jason           8/17 22:10   0+01:46:05 I  0   0.0
> >  sample-job 60     
> >  13.0   Jason           8/18 01:12   0+01:24:43 I  0   0.0
> >  sample-job 60     
> >  14.0   Jason           8/18 01:24   0+00:02:49 I  0   0.0
> >  sample-job 60     
> >  15.0   Jason           8/18 01:53   0+00:00:00 I  0   0.0
> >  sample-job 60     
> > 
> > 4 jobs; 4 idle, 0 running, 0 held
> > 
> > jimi:~ root# condor_rm 11.0
> > AUTHENTICATE:1003:Failed to authenticate with any method
> > No result found for job 11.0
> > ------
> > 
> > 
> > CONFIGURATIONS:
> > 
> > 
> > -------- condor_config.local on MAC:
> > --------
> >   CREDD_HOST = 10.211.55.10
> >   STARTER_ALLOW_RUNAS_OWNER = True
> >   CREDD_CACHE_LOCALLY = True
> >   ALLOW_CONFIG = root@$(CONDOR_HOST), *
> >   SEC_CONFIG_NEGOTIATION = REQUIRED
> >   SEC_CONFIG_AUTHENTICATION = REQUIRED
> >   SEC_CONFIG_ENCRYPTION = REQUIRED
> >   SEC_CONFIG_INTEGRITY = REQUIRED
> >   SEC_PASSWORD_FILE = /usr/local/condor/etc/pool_password
> > 
> > -------- condor_config.local on Windows:
> > --------
> > CREDD_HOST = xx.xxx.55.10
> >   STARTER_ALLOW_RUNAS_OWNER = True
> >   CREDD_CACHE_LOCALLY = True
> >   SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
> >   ALLOW_CONFIG = *
> >   SEC_CONFIG_NEGOTIATION = REQUIRED
> >   SEC_CONFIG_AUTHENTICATION = REQUIRED
> >   SEC_CONFIG_ENCRYPTION = REQUIRED
> >   SEC_CONFIG_INTEGRITY = REQUIRED
> > 
> > ------- condor_config on Windows
> > ------- i made this low security just try to get it working:
> > -------
> > ALLOW_WRITE = *
> > ALLOW_READ = *
> > #... not sure what else you need to see
> > 
> > 
> > LOG FILES:
> > 
> > --------- CredLog - on windows
> > --------- this is after turning MAC & WIN firewalls off - not a perm
> > solution, but not working anyway:
> > ---------
> > 08/18/11 14:42:18 Failed to start non-blocking update to
> > <xxx.xxx.1.21:9618>.
> > 08/18/11 14:42:18 Return from Handler
> > <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC>
> > 0.0000s
> > 08/18/11 14:47:18 Calling Handler
> > <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
> > 08/18/11 14:47:18 Return from Handler
> > <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC>
> > 0.0000s
> > 08/18/11 14:47:18 Calling Handler
> > <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
> > 08/18/11 14:47:18 SECMAN: required authentication with
> > <xxx.xxx.1.21:9618> failed, so aborting command UPDATE_AD_GENERIC.
> > 08/18/11 14:47:18 ERROR: SECMAN:2004:Failed to create security
> > session to <xxx.xxx.1.21:9618> with TCP.
> > |AUTHENTICATE:1003:Failed to authenticate with any method
> > 08/18/11 14:47:18 Failed to start non-blocking update to
> > <xxx.xxx.1.21:9618>.
> > 08/18/11 14:47:18 Return from Handler
> > <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC>
> > 0.0000s
> > 08/18/11 14:52:39 attempt to connect to <xxx.xxx.1.21:9618> failed:
> > timed out after 20 seconds.
> > 08/18/11 14:52:39 Calling Handler
> > <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC> (2)
> > 08/18/11 14:52:39 ERROR: SECMAN:2004:Failed to create security
> > session to <xxx.xxx.1.21:9618> with TCP.
> > |SECMAN:2003:TCP connection to <xxx.xxx.1.21:9618> failed.
> > 08/18/11 14:52:39 Failed to start non-blocking update to
> > <xxx.xxx.1.21:9618>.
> > 08/18/11 14:52:39 Return from Handler
> > <SecManStartCommand::WaitForSocketCallback UPDATE_AD_GENERIC>
> > 0.0000s
> > 
> > --------- MasterLog - on windows
> > ---------
> > ---------
> > 08/18/11 14:51:50 condor_read(): timeout reading 21 bytes from
> > <10.211.55.10:53043>.
> > 08/18/11 14:51:50 IO: Failed to read packet header
> > 08/18/11 14:51:50 store_pool_cred: failed to receive all parameters
> > 
> > 
> > COMMAND LINE OUTPUT:
> > 
> > ---------- condor_status - on windows
> > ---------- Manual says to run this when you are done, doesn't
> > mention the command 
> > ---------- only works on the windows side:
> > C:\Users\Administrator>condor_status -f "%s\t" Name -f "%s\n"
> > ifThenElse(isUndefined(LocalCredd),\"UNDEF"\",LocalCredd)
> > slot1@JASONHERMANB752   UNDEF
> > slot1@xxxxxxxxxxxxxxxx  UNDEF
> > slot2@JASONHERMANB752   UNDEF
> > slot2@xxxxxxxxxxxxxxxx  UNDEF
> > slot3@xxxxxxxxxxxxxxxx  UNDEF
> > slot4@xxxxxxxxxxxxxxxx  UNDEF
> > slot5@xxxxxxxxxxxxxxxx  UNDEF
> > slot6@xxxxxxxxxxxxxxxx  UNDEF
> > slot7@xxxxxxxxxxxxxxxx  UNDEF
> > slot8@xxxxxxxxxxxxxxxx  UNDEF
> > 
> > 
> > ------- condor_status - MAC (identical on windows)
> > -------
> > -------
> > jimi:log root# condor_status
> > 
> > Name               OpSys      Arch   State     Activity LoadAv Mem
> >   ActvtyTime
> > 
> > slot1@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.210  1024
> >  0+19:09:01
> > slot2@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024
> >  1+11:24:12
> > slot3@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024
> >  1+03:18:37
> > slot4@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024
> >  0+23:14:03
> > slot5@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024
> >  0+15:05:52
> > slot6@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024
> >  0+11:04:54
> > slot7@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024
> >  0+06:59:54
> > slot8@xxxxxxxxxxxx OSX        X86_64 Unclaimed Idle     0.000  1024
> >  1+15:27:42
> > slot1@JASONHERMANB WINNT60    INTEL  Unclaimed Idle     0.120  1023
> >  0+00:00:04
> > slot2@JASONHERMANB WINNT60    INTEL  Unclaimed Idle     0.100  1023
> >  0+00:00:02
> >                     Total Owner Claimed Unclaimed Matched Preempting
> > Backfill
> > 
> >       INTEL/WINNT60     2     0       0         2       0
> >          0        0
> >          X86_64/OSX     8     0       0         8       0
> >          0        0
> > 
> >               Total    10     0       0        10       0
> >          0        0
> > 
> > 
> > -------- condor_store_cred on Windows:
> > --------
> > --------
> > C:\Users\Administrator>condor_store_cred -c add
> > Account: condor_pool@JASONHERMANB752
> > 
> > Enter password:
> > 
> > Operation failed.
> >    Make sure you have CONFIG access to the target Master.
> > 
> > 
> > thanks kindly for any assistance, jason
> > 
> > 
> > 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/