[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor_store_cred - Setting up Condor on Windows



Sonia,

You'll need to start a schedd deamon on your execute node, run the credential store command, and then shut down the schedd. It's a necessity of storing credentials. It only has to be running while you're storing credentials.

- Ian

On 2010-09-24, at 5:41 AM, Sónia Liléo <sonia.lileo@xxxxx> wrote:

Hi Condor-Users!

 

 

I have written about this before and I will write about it again because I am still stumbling on this issue.

 

I am using a software that requires that Condor always use the same user account when running jobs. Right now I have a small Condor pool with 3 machines running Windows.

 

I have defined in the condor_config file of each machine SLOT1_USER = domain\user_account that Condor should use to run the jobs.

I have also included DEDICATED_EXECUTE_ACCOUNT_REGEXP = True.

 

But the problem occurs when I have to store the credentials on each machine.

I have received instructions from my software supplier that I should run CONDOR_STORE_CRED ADD on every machine of the pool.

 

But when I issue the command CONDOR_STORE_CRED ADD I get the error “Make sure your ALLOW_WRITE setting includes this host”. And, yes, it does. The ALLOW_WRITE variable includes this host (it is set to *).

 

When the software supplier contacted the Condor Team regarding this issue they got the following answer,

 

This is a common problem people encounter when setting up Condor on Windows. This error indicates that there is a communication problem between the condor_store_cred tool and the condor_schedd daemon. The first thing you want to do is verify that the schedd is in fact running on the machine from which you are executing condor_store_cred. If it is, the SchedLog is the place to look for details on why the communication is failing. A common reason is because of a misconfigured security setup, which is why the error message refers to HOSTALLOW_WRITE. Of course, there may be other problems. Adding the D_SECURITY flag to the SCHEDD_DEBUG configuration macro will allow you to get the most information out of your SchedLog.

 

Hope this helps. Let me know if you need any more help tracking this down.

 

Thanks,

Greg Quinn

Condor Team

 

Greg wrote that

The first thing you want to do is verify that the schedd is in fact running on the machine from which you are executing condor_store_cred.

I have checked it, and no, schedd daemon is not running on the machine I am executing condor_store_cred. It is only running on the central manager. And there the command condor_store_cred worked fine.

 

Isn’t it so that condor_schedd should only run on the machine where the jobs may be submitted from, in my case this is the central manager?

 

1.       Is it really necessary to execute condor_store_cred add on every machine of my pool?

 

2.       If yes, is it necessary that condor_schedd runs on every machine of the condor pool?

 

3.       If yes, how should I do so that condor_schedd runs on every machine?

 

 

I include below the SchedLog file related to the submitted job with ID 29.

10.110.44.12 is the central manager where the jobs are submitted from; Condor_schedd is running on this machine; Condor_store_cred add worked fine on this machine.

10.110.44.19 is the execute machine; Condor_store_cred add didn’t work on this machine; Condor_schedd is not running on this machine.

 

Any clue of what is happening?

 

Cheers,

Sónia

 

9/24 11:14:33 Starting add_shadow_birthdate(29.0)

9/24 11:14:33 Started shadow for job 29.0 on "<10.110.44.19:1232>", (shadow pid = 4172)

9/24 11:14:34 DC_AUTHENTICATE: received DC_AUTHENTICATE from <10.110.44.12:53509>

9/24 11:14:34 DC_AUTHENTICATE: added session id o2f-sth-lap-016:4012:1285319674:43 to cache for 8640000 seconds!

9/24 11:14:34 DC_AUTHENTICATE: received UDP packet from <10.110.44.12:51224>.

9/24 11:14:34 DC_AUTHENTICATE: received DC_AUTHENTICATE from <10.110.44.12:51224>

9/24 11:14:34 DC_AUTHENTICATE: resuming session id o2f-sth-lap-016:4012:1285319674:43 given to <10.110.44.12:53509>:

9/24 11:14:34 DC_AUTHENTICATE: Success.

9/24 11:14:34 STARTCOMMAND: starting 60001 to <10.110.44.12:53238> on UDP port 51225.

9/24 11:14:34 SECMAN: command 60001 to <10.110.44.12:53238> on UDP port 51225.

9/24 11:14:34 SECMAN: Cookie="2E5D02CE7113BE0793647B61D1BB49E8ED987ADCAD1C68D33A6E0DA5F1E9396F856B4060CBB7A93FEC1430B3470CE5D97C2FFB12626DEAD2320F8D124A7FD4C"

9/24 11:14:34 SECMAN: startCommand succeeded.

9/24 11:14:34 DC_AUTHENTICATE: received UDP packet from <10.110.44.12:51225>.

9/24 11:14:34 DC_AUTHENTICATE: received DC_AUTHENTICATE from <10.110.44.12:51225>

9/24 11:14:34 DC_AUTHENTICATE: Success.

9/24 11:14:34 DaemonCore: Command received via UDP from host <10.110.44.12:51225>

9/24 11:14:34 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())

9/24 11:14:34 Shadow pid 4172 for job 29.0 exited with status 4

9/24 11:14:34 ERROR: Shadow exited with job exception code!

9/24 11:14:36 ERROR: SetHandleInformation() failed in SetFDInheritFlag(0,0),err=87

9/24 11:14:36 ERROR: SetHandleInformation() failed in SetFDInheritFlag(1,0),err=87

9/24 11:14:36 ERROR: SetHandleInformation() failed in SetFDInheritFlag(2,0),err=87

9/24 11:14:36 Starting add_shadow_birthdate(29.0)

9/24 11:14:36 Started shadow for job 29.0 on "<10.110.44.19:1232>", (shadow pid = 3280)

9/24 11:14:37 STARTCOMMAND: starting 1 to <10.110.44.12:9618> on UDP port 51226.

9/24 11:14:37 SECMAN: command 1 to <10.110.44.12:9618> on UDP port 51226.

9/24 11:14:37 SECMAN: using session o2f-sth-lap-016:2396:1285318979:3 for {<10.110.44.12:9618>,<1>}.

9/24 11:14:37 SECMAN: UDP, have_session == 1, can_neg == 1

9/24 11:14:37 SECMAN: startCommand succeeded.

9/24 11:14:37 Sent ad to central manager for o2f_sonlil@xxxxxxxxxxxxxxxxxxxx

9/24 11:14:37 STARTCOMMAND: starting 11 to <10.110.44.12:9618> on UDP port 51227.

9/24 11:14:37 SECMAN: command 11 to <10.110.44.12:9618> on UDP port 51227.

9/24 11:14:37 SECMAN: using session o2f-sth-lap-016:2396:1285318979:3 for {<10.110.44.12:9618>,<11>}.

9/24 11:14:37 SECMAN: UDP, have_session == 1, can_neg == 1

9/24 11:14:37 SECMAN: startCommand succeeded.

9/24 11:14:37 Sent ad to 1 collectors for o2f_sonlil@xxxxxxxxxxxxxxxxxxxx

9/24 11:14:38 DC_AUTHENTICATE: received DC_AUTHENTICATE from <10.110.44.12:53518>

9/24 11:14:38 DC_AUTHENTICATE: added session id o2f-sth-lap-016:4012:1285319678:44 to cache for 8640000 seconds!

9/24 11:14:38 DC_AUTHENTICATE: received UDP packet from <10.110.44.12:51228>.

9/24 11:14:38 DC_AUTHENTICATE: received DC_AUTHENTICATE from <10.110.44.12:51228>

9/24 11:14:38 DC_AUTHENTICATE: resuming session id o2f-sth-lap-016:4012:1285319678:44 given to <10.110.44.12:53518>:

9/24 11:14:38 DC_AUTHENTICATE: Success.

9/24 11:14:38 STARTCOMMAND: starting 60001 to <10.110.44.12:53238> on UDP port 51229.

9/24 11:14:38 SECMAN: command 60001 to <10.110.44.12:53238> on UDP port 51229.

9/24 11:14:38 SECMAN: Cookie="2E5D02CE7113BE0793647B61D1BB49E8ED987ADCAD1C68D33A6E0DA5F1E9396F856B4060CBB7A93FEC1430B3470CE5D97C2FFB12626DEAD2320F8D124A7FD4C"

9/24 11:14:38 SECMAN: startCommand succeeded.

9/24 11:14:38 DC_AUTHENTICATE: received UDP packet from <10.110.44.12:51229>.

9/24 11:14:38 DC_AUTHENTICATE: received DC_AUTHENTICATE from <10.110.44.12:51229>

9/24 11:14:38 DC_AUTHENTICATE: Success.

9/24 11:14:38 DaemonCore: Command received via UDP from host <10.110.44.12:51229>

9/24 11:14:38 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())

9/24 11:14:38 Shadow pid 3280 for job 29.0 exited with status 4

9/24 11:14:38 ERROR: Shadow exited with job exception code!

9/24 11:14:38 Match for cluster 29 has had 5 shadow exceptions, relinquishing.

9/24 11:14:38 STARTCOMMAND: starting 443 to <10.110.44.19:1232> on UDP port 51230.

9/24 11:14:38 SECMAN: command 443 to <10.110.44.19:1232> on UDP port 51230.

9/24 11:14:38 SECMAN: using session O2F-sth-LAP-002:588:1285319520:39 for {<10.110.44.19:1232>,<443>}.

9/24 11:14:38 SECMAN: UDP, have_session == 1, can_neg == 1

9/24 11:14:38 SECMAN: startCommand succeeded.

9/24 11:14:38 STARTCOMMAND: starting 443 to <10.110.44.19:1232> on UDP port 51231.

9/24 11:14:38 SECMAN: command 443 to <10.110.44.19:1232> on UDP port 51231.

9/24 11:14:38 SECMAN: using session O2F-sth-LAP-002:588:1285319520:39 for {<10.110.44.19:1232>,<443>}.

9/24 11:14:38 SECMAN: UDP, have_session == 1, can_neg == 1

9/24 11:14:38 SECMAN: startCommand succeeded.

9/24 11:14:38 Sent RELEASE_CLAIM to startd on <10.110.44.19:1232>

9/24 11:14:38 Match record (<10.110.44.19:1232>, 29, 0) deleted

9/24 11:14:38 DC_AUTHENTICATE: received DC_AUTHENTICATE from <10.110.44.19:1544>

9/24 11:14:38 DC_AUTHENTICATE: resuming session id o2f-sth-lap-016:4012:1285319536:25 given to <10.110.44.19:1526>:

9/24 11:14:38 DC_AUTHENTICATE: Success.

9/24 11:14:38 DaemonCore: Command received via TCP from host <10.110.44.19:1544>

9/24 11:14:38 DaemonCore: received command 443 (VACATE_SERVICE), calling handler (vacate_service)

9/24 11:14:38 Got VACATE_SERVICE from <10.110.44.19:1544>

 

Sónia Liléo
O2 Strandvägen 5B 114 51 Stockholm
Tel: +46 8 559 310 37 Mobile: +46 73 752 95 74

www.o2.se

 

 

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/