[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] run_as_owner not working in 6.9.5: LOCAL_CREDDbug?



Thanks Coop for your quick reply.

 

However, the problem remains in 6.9.5, even having taken the steps that you describe (I just reverified this to make sure).

 

I have managed to get vanilla RUN_AS_OWNER jobs working with 6.9.5, by using CREDD_HOST=$(CONDOR_HOST) (i.e. without the port setting) on both master and execute node. But the real prize for me is to be able to run vm-universe jobs with RUN_AS_OWNER, and I still cannot make this work with a shared filesystem. Looking at the vm_gahp log below seems to indicate that even with:

run_as_owner = true specified in the job file,

VM_UNIV_NOBODY_USER specified to a user with a home directory in the config file,

ALLOW_USERS specified to the same user in the config_vmgapp.vmware file

the vm process seems to be launched with system credentials SYSTEM@NT AUTHORITY that are insufficient to access the shared virtual machine files. I have confirmed that these files *are* visible to a vanilla job run on the same execute node with RUN_AS_OWNER = true.

 

Maybe these are the perils of running a pre-release development version...

 

Malcolm

 

1/8 11:41:03 ******************************************************

1/8 11:41:03 ** condor_vm-gahp.exe (CONDOR_VM_GAHP) STARTING UP

1/8 11:41:03 ** C:\condor\bin\condor_vm-gahp.exe

1/8 11:41:03 ** $CondorVersion: 6.9.5 Nov 28 2007 $

1/8 11:41:03 ** $CondorPlatform: INTEL-WINNT50 $

1/8 11:41:03 ** PID = 904

1/8 11:41:03 ** Log last touched 1/8 11:34:11

1/8 11:41:03 ******************************************************

1/8 11:41:03 Using config source: C:\condor\condor_config

1/8 11:41:03 Using local config sources:

1/8 11:41:03    C:\condor/condor_config.local

1/8 11:41:03 DaemonCore: Command Socket at <192.168.199.190:4756>

1/8 11:41:03 VMGAHP[904]: VM-GAHP initialized with run-mode 1

1/8 11:41:03 VMGAHP[904]: Initialize Uids: caller=SYSTEM@NT AUTHORITY, job user=SYSTEM@NT AUTHORITY

1/8 11:41:03 VMGAHP[904]: Starting worker : C:\condor/bin/condor_vm-gahp.exe -f -t -M 2

1/8 11:41:03 VMGAHP[904]: Worker pid=1588

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03 ******************************************************

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03 ** condor_vm-gahp.exe (CONDOR_VM_GAHP) STARTING UP

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03 ** C:\condor\bin\condor_vm-gahp.exe

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03 ** $CondorVersion: 6.9.5 Nov 28 2007 $

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03 ** $CondorPlatform: INTEL-WINNT50 $

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03 ** PID = 1588

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03 ** Log last touched time unavailable (No error)

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03 ******************************************************

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03 Using config source: C:\condor\condor_config

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03 Using local config sources:

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03    C:\condor/condor_config.local

1/8 11:41:03 VMGAHP[904]: Worker[1588]: 1/8 11:41:03 DaemonCore: Command Socket at <192.168.199.190:4759>

1/8 11:41:03 VMGAHP[904]: Worker[1588]: VM-GAHP initialized with run-mode 2

1/8 11:41:03 VMGAHP[904]: Worker[1588]: Initialize Uids: caller=SYSTEM@NT AUTHORITY, job user=SYSTEM@NT AUTHORITY

1/8 11:41:24 condor_read(): timeout reading 5 bytes from <192.168.199.190:4752>.

1/8 11:41:24 IO: Failed to read packet header

1/8 11:41:27 VMGAHP[904]: Worker[1588]: Warning: creating filesystem with (nonstandard) Joliet extensions

1/8 11:41:27 VMGAHP[904]: Worker[1588]:          but without (standard) Rock Ridge extensions.

1/8 11:41:27 VMGAHP[904]: Worker[1588]:          It is highly recommended to add Rock Ridge

1/8 11:41:28 VMGAHP[904]: Worker[1588]: File(\\xxxx\xxxx\xxx\condor1\VM\vm_test\vm_test-000001.vmdk) can't be read

1/8 11:41:28 VMGAHP[904]: Worker[1588]: file(\\xxxx\xxxx\xxx\condor1\VM\vm_test\vm_test-000001.vmdk) in a vmx file cannot be read

1/8 11:41:34 VMGAHP[904]: EOF reached on DaemonCore pipe 65541

1/8 11:41:34 VMGAHP[904]: VM GAHP Worker stderr buffer closed, exiting...

 

 

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Thompson, Cooper
Sent: 08 January 2008 14:08
To: Condor-Users Mail List
Subject: Re: [Condor-users] run_as_owner not working in 6.9.5: LOCAL_CREDDbug?

 

Starting simply:  you need to run the “condor_store_cred –c add” command, and then restart Condor (using ‘net stop condor && net start condor”) before the LOCAL_CRED=<name>:<port> will appear in the ClassAd.  I believe a condor_reconfig or a partial restart is not sufficient.

 

The stored password is not removed when uninstalling Condor, so if you ran the condor_store_cred command without restarting Condor, and then rolled back to 6.8.8, that may have caused it to work.

 

Coop

 


From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Malcolm Wilkins
Sent: Tuesday, January 08, 2008 4:47 AM
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] run_as_owner not working in 6.9.5: LOCAL_CREDD bug?

 

I am trying to set up a small Condor pool with one submit only/master node (Vista) and one submit/execute node (XP). I have been trying (unsuccessfully) to get the RUN_AS_OWNER feature working, so that jobs submitted will be run under the credentials of the submitter. The jobs remain in the idle queue and do not run: using condor_q –analyze indicates the problem may be that the job requires that the execute node must advertise “LOCAL_CREDD = <hostname of CREDD host>:9620”.

 

Specifying CREDD_HOST=$(CONDOR_HOST):$(CREDD_PORT) in the condor_config on the execute node (the default) *does not* work as expected: instead of displaying  “LOCAL_CREDD = <hostname of CREDD host>:9620” in response to condor_status -long,  no information is displayed at all. (However, specifying CREDD_HOST=$(CONDOR_HOST) does causes “LOCAL_CREDD = <hostname of CREDD host>” to be displayed).

 

I then tried reverting to 6.8.8 and this version *does* display the full information i.e. “LOCAL_CREDD = <hostname of CREDD host>:9620” for the execute node.

 

Has anyone else come across such a problem, and is there a workaround?