Hi Condor-Users! I have written about this before and I will
write about it again because I am still stumbling on this issue. I am using a software that requires that
Condor always use the same user account when running jobs. Right now I have a
small Condor pool with 3 machines running Windows. I have defined in the condor_config file of
each machine SLOT1_USER = domain\user_account that Condor should use to run the
jobs. I have also included DEDICATED_EXECUTE_ACCOUNT_REGEXP
= True. But the problem occurs when I have to store
the credentials on each machine. I have received instructions from my
software supplier that I should run CONDOR_STORE_CRED ADD on every machine of
the pool. But when I issue the command CONDOR_STORE_CRED
ADD I get the error “Make sure your ALLOW_WRITE setting includes this
host”. And, yes, it does. The ALLOW_WRITE variable includes this host (it
is set to *). When the software supplier contacted the
Condor Team regarding this issue they got the following answer, This is a common problem people encounter when setting up Condor
on Windows. This error indicates that there is a communication problem between
the condor_store_cred tool and the condor_schedd daemon. The first thing you
want to do is verify that the schedd is in fact running on the machine from
which you are executing condor_store_cred. If it is, the SchedLog is the place
to look for details on why the communication is failing. A common reason is
because of a misconfigured security setup, which is why the error message
refers to HOSTALLOW_WRITE. Of course, there may be other problems. Adding the
D_SECURITY flag to the SCHEDD_DEBUG configuration macro will allow you to get
the most information out of your SchedLog. Hope this helps. Let me know if you need any more help tracking
this down. Thanks, Greg Quinn Condor Team Greg wrote that The first thing you want to do is verify that the schedd is in
fact running on the machine from which you are executing condor_store_cred. I have checked it, and
no, schedd daemon is not running on the machine I am executing
condor_store_cred. It is only running on the central manager. And there the
command condor_store_cred worked fine. Isn’t it so that
condor_schedd should only run on the machine where the jobs may be submitted
from, in my case this is the central manager? 1. Is it really necessary to execute
condor_store_cred add on every machine of my pool? 2. If yes, is it necessary that condor_schedd runs
on every machine of the condor pool? 3. If yes, how should I do so that condor_schedd
runs on every machine? I include below the SchedLog file related
to the submitted job with ID 29. 10.110.44.12 is the central manager where
the jobs are submitted from; Condor_schedd is running on this machine;
Condor_store_cred add worked fine on this machine. 10.110.44.19 is the execute machine;
Condor_store_cred add didn’t work on this machine; Condor_schedd is not
running on this machine. Any clue of what is happening? Cheers, Sónia 9/24 11:14:33 Starting
add_shadow_birthdate(29.0) 9/24 11:14:33 Started shadow for job 29.0
on "<10.110.44.19:1232>", (shadow pid = 4172) 9/24 11:14:34 DC_AUTHENTICATE: received
DC_AUTHENTICATE from <10.110.44.12:53509> 9/24 11:14:34 DC_AUTHENTICATE: added
session id o2f-sth-lap-016:4012:1285319674:43 to cache for 8640000 seconds! 9/24 11:14:34 DC_AUTHENTICATE: received UDP
packet from <10.110.44.12:51224>. 9/24 11:14:34 DC_AUTHENTICATE: received
DC_AUTHENTICATE from <10.110.44.12:51224> 9/24 11:14:34 DC_AUTHENTICATE: resuming
session id o2f-sth-lap-016:4012:1285319674:43 given to
<10.110.44.12:53509>: 9/24 11:14:34 DC_AUTHENTICATE: Success. 9/24 11:14:34 STARTCOMMAND: starting 60001
to <10.110.44.12:53238> on UDP port 51225. 9/24 11:14:34 SECMAN: command 60001 to
<10.110.44.12:53238> on UDP port 51225. 9/24 11:14:34 SECMAN:
Cookie="2E5D02CE7113BE0793647B61D1BB49E8ED987ADCAD1C68D33A6E0DA5F1E9396F856B4060CBB7A93FEC1430B3470CE5D97C2FFB12626DEAD2320F8D124A7FD4C" 9/24 11:14:34 SECMAN: startCommand
succeeded. 9/24 11:14:34 DC_AUTHENTICATE: received UDP
packet from <10.110.44.12:51225>. 9/24 11:14:34 DC_AUTHENTICATE: received
DC_AUTHENTICATE from <10.110.44.12:51225> 9/24 11:14:34 DC_AUTHENTICATE: Success. 9/24 11:14:34 DaemonCore: Command received
via UDP from host <10.110.44.12:51225> 9/24 11:14:34 DaemonCore: received command
60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand()) 9/24 11:14:34 Shadow pid 4172 for job 29.0
exited with status 4 9/24 11:14:34 ERROR: Shadow exited with job
exception code! 9/24 11:14:36 ERROR: SetHandleInformation()
failed in SetFDInheritFlag(0,0),err=87 9/24 11:14:36 ERROR: SetHandleInformation()
failed in SetFDInheritFlag(1,0),err=87 9/24 11:14:36 ERROR: SetHandleInformation()
failed in SetFDInheritFlag(2,0),err=87 9/24 11:14:36 Starting
add_shadow_birthdate(29.0) 9/24 11:14:36 Started shadow for job 29.0
on "<10.110.44.19:1232>", (shadow pid = 3280) 9/24 11:14:37 STARTCOMMAND: starting 1 to
<10.110.44.12:9618> on UDP port 51226. 9/24 11:14:37 SECMAN: command 1 to <10.110.44.12:9618>
on UDP port 51226. 9/24 11:14:37 SECMAN: using session
o2f-sth-lap-016:2396:1285318979:3 for {<10.110.44.12:9618>,<1>}. 9/24 11:14:37 SECMAN: UDP, have_session ==
1, can_neg == 1 9/24 11:14:37 SECMAN: startCommand
succeeded. 9/24 11:14:37 Sent ad to central manager
for o2f_sonlil@xxxxxxxxxxxxxxxxxxxx 9/24 11:14:37 STARTCOMMAND: starting 11 to
<10.110.44.12:9618> on UDP port 51227. 9/24 11:14:37 SECMAN: command 11 to
<10.110.44.12:9618> on UDP port 51227. 9/24 11:14:37 SECMAN: using session o2f-sth-lap-016:2396:1285318979:3
for {<10.110.44.12:9618>,<11>}. 9/24 11:14:37 SECMAN: UDP, have_session ==
1, can_neg == 1 9/24 11:14:37 SECMAN: startCommand
succeeded. 9/24 11:14:37 Sent ad to 1 collectors for o2f_sonlil@xxxxxxxxxxxxxxxxxxxx 9/24 11:14:38 DC_AUTHENTICATE: received
DC_AUTHENTICATE from <10.110.44.12:53518> 9/24 11:14:38 DC_AUTHENTICATE: added
session id o2f-sth-lap-016:4012:1285319678:44 to cache for 8640000 seconds! 9/24 11:14:38 DC_AUTHENTICATE: received UDP
packet from <10.110.44.12:51228>. 9/24 11:14:38 DC_AUTHENTICATE: received
DC_AUTHENTICATE from <10.110.44.12:51228> 9/24 11:14:38 DC_AUTHENTICATE: resuming
session id o2f-sth-lap-016:4012:1285319678:44 given to
<10.110.44.12:53518>: 9/24 11:14:38 DC_AUTHENTICATE: Success. 9/24 11:14:38 STARTCOMMAND: starting 60001
to <10.110.44.12:53238> on UDP port 51229. 9/24 11:14:38 SECMAN: command 60001 to
<10.110.44.12:53238> on UDP port 51229. 9/24 11:14:38 SECMAN:
Cookie="2E5D02CE7113BE0793647B61D1BB49E8ED987ADCAD1C68D33A6E0DA5F1E9396F856B4060CBB7A93FEC1430B3470CE5D97C2FFB12626DEAD2320F8D124A7FD4C" 9/24 11:14:38 SECMAN: startCommand
succeeded. 9/24 11:14:38 DC_AUTHENTICATE: received UDP
packet from <10.110.44.12:51229>. 9/24 11:14:38 DC_AUTHENTICATE: received
DC_AUTHENTICATE from <10.110.44.12:51229> 9/24 11:14:38 DC_AUTHENTICATE: Success. 9/24 11:14:38 DaemonCore: Command received
via UDP from host <10.110.44.12:51229> 9/24 11:14:38 DaemonCore: received command
60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand()) 9/24 11:14:38 Shadow pid 3280 for job 29.0
exited with status 4 9/24 11:14:38 ERROR: Shadow exited with job
exception code! 9/24 11:14:38 Match for cluster 29 has had
5 shadow exceptions, relinquishing. 9/24 11:14:38 STARTCOMMAND: starting 443 to
<10.110.44.19:1232> on UDP port 51230. 9/24 11:14:38 SECMAN: command 443 to
<10.110.44.19:1232> on UDP port 51230. 9/24 11:14:38 SECMAN: using session
O2F-sth-LAP-002:588:1285319520:39 for {<10.110.44.19:1232>,<443>}. 9/24 11:14:38 SECMAN: UDP, have_session ==
1, can_neg == 1 9/24 11:14:38 SECMAN: startCommand
succeeded. 9/24 11:14:38 STARTCOMMAND: starting 443 to
<10.110.44.19:1232> on UDP port 51231. 9/24 11:14:38 SECMAN: command 443 to
<10.110.44.19:1232> on UDP port 51231. 9/24 11:14:38 SECMAN: using session
O2F-sth-LAP-002:588:1285319520:39 for {<10.110.44.19:1232>,<443>}. 9/24 11:14:38 SECMAN: UDP, have_session ==
1, can_neg == 1 9/24 11:14:38 SECMAN: startCommand
succeeded. 9/24 11:14:38 Sent RELEASE_CLAIM to startd
on <10.110.44.19:1232> 9/24 11:14:38 Match record (<10.110.44.19:1232>,
29, 0) deleted 9/24 11:14:38 DC_AUTHENTICATE: received
DC_AUTHENTICATE from <10.110.44.19:1544> 9/24 11:14:38 DC_AUTHENTICATE: resuming
session id o2f-sth-lap-016:4012:1285319536:25 given to
<10.110.44.19:1526>: 9/24 11:14:38 DC_AUTHENTICATE: Success. 9/24 11:14:38 DaemonCore: Command received
via TCP from host <10.110.44.19:1544> 9/24 11:14:38 DaemonCore: received command
443 (VACATE_SERVICE), calling handler (vacate_service) 9/24 11:14:38 Got VACATE_SERVICE from
<10.110.44.19:1544> Sónia Liléo |