[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] LogonUser(condor-reuse-slot1, ... ) failed with status 1385



I've created a ticket that covers the problem:

http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1486

I also have a patch, but I'll need to test it a little further. I'll put it up for review as soon as it is complete.

Regards,
-B

On 2010-07-01, at 10:58 AM, kschwarz@xxxxxxxxxxxxxx wrote:

> Hi,
> 
> I am loosing communication between SHADOW and STARTER daemons. Looking at 
> their log files that I paste below:
> 
> The ShadowLog shows:
> 
> 7/1 11:54:41 ******************************************************
> 7/1 11:54:41 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> 7/1 11:54:41 ** C:\Condor\bin\condor_shadow.exe
> 7/1 11:54:41 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
> 7/1 11:54:41 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
> 7/1 11:54:41 ** $CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $
> 7/1 11:54:41 ** $CondorPlatform: INTEL-WINNT50 $
> 7/1 11:54:41 ** PID = 1580
> 7/1 11:54:41 ** Log last touched 6/30 13:30:28
> 7/1 11:54:41 ******************************************************
> 7/1 11:54:41 Using config source: C:\condor\condor_config
> 7/1 11:54:41 Using local config sources: 
> 7/1 11:54:41    C:\condor/condor_config.local
> <snip>
> 7/1 11:54:41 DaemonCore: Command Socket at <10.3.28.8:45848>
> 7/1 11:54:41 Initializing a VANILLA shadow for job 9.0
> 7/1 11:54:42 (9.0) (1580): Request to run on slot1@xxxxxxxxxxxxxxxxxxxx 
> <10.11.3.133:10882> was ACCEPTED
> 7/1 11:54:56 (9.0) (1580): condor_read(): recv() returned -1, errno = 
> 10054, assuming failure reading 5 bytes from <10.11.3.133:10882>.
> 7/1 11:54:56 (9.0) (1580): IO: Failed to read packet header
> 7/1 11:54:56 (9.0) (1580): Can no longer talk to condor_starter 
> <10.11.3.133:10882>
> 7/1 11:54:56 (9.0) (1580): Trying to reconnect to disconnected job
> 7/1 11:54:56 (9.0) (1580): LastJobLeaseRenewal: 1277996096 Thu Jul 01 
> 11:54:56 2010
> 7/1 11:54:56 (9.0) (1580): JobLeaseDuration: 1200 seconds
> 7/1 11:54:56 (9.0) (1580): JobLeaseDuration remaining: 1200
> 7/1 11:54:56 (9.0) (1580): Attempting to locate disconnected starter
> 7/1 11:54:56 (9.0) (1580): locateStarter(): ClaimId 
> (<10.11.3.133:10882>#1277995944#1#9a100f3b140949e336ed2e5322947dd25e5971fa) 
> and GlobalJobId ( PC284419.corp.ad.emb#9.0#1277996003 ) not found
> 7/1 11:54:56 (9.0) (1580): Reconnect FAILED: Job not found at execution 
> machine
> 7/1 11:54:56 (9.0) (1580): **** condor_shadow (condor_SHADOW) pid 1580 
> EXITING WITH STATUS 107
> 
> The StarterLog.slot1 shows:
> 
> 7/1 11:54:55 KEYCACHE: created: 00B87160
> 7/1 11:54:55 ******************************************************
> 7/1 11:54:55 ** condor_starter (CONDOR_STARTER) STARTING UP
> 7/1 11:54:55 ** C:\Condor\bin\condor_starter.exe
> 7/1 11:54:55 ** SubsystemInfo: name=STARTER type=STARTER(8) 
> class=DAEMON(1)
> 7/1 11:54:55 ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
> 7/1 11:54:55 ** $CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $
> 7/1 11:54:55 ** $CondorPlatform: INTEL-WINNT50 $
> 7/1 11:54:55 ** PID = 5276
> 7/1 11:54:55 ** Log last touched time unavailable (No such file or 
> directory)
> 7/1 11:54:55 ******************************************************
> 7/1 11:54:55 Using config source: C:\Condor\condor_config
> 7/1 11:54:55 Using local config sources: 
> 7/1 11:54:55    C:\condor/condor_config.local
> <snip>
> 7/1 11:54:55 DaemonCore: Command Socket at <10.11.3.133:7721>
> 7/1 11:54:55 GLEXEC_JOB not supported on this platform; ignoring
> 7/1 11:54:55 Setting resource limits not implemented!
> 7/1 11:54:55 Communicating with shadow <10.3.28.8:45848>
> 7/1 11:54:55 Submitting machine is "10-3-28-8.sjk.emb"
> 7/1 11:54:55 setting the orig job name in starter
> 7/1 11:54:55 setting the orig job iwd in starter
> 7/1 11:54:56 LogonUser(condor-reuse-slot1, ... ) failed with status 1385
> 7/1 11:54:56 ERROR "Failed to create a user nobody" at line 442 in file 
> ..\src\condor_c++_util\uids.cpp
> 7/1 11:54:56 ERROR "LocalUserLog::logStarterError() called before init()" 
> at line 222 in file ..\src\condor_starter.V6.1\local_user_log.cpp
> 
> The LogonUser message above shows a "Logon failure: the user has not been 
> granted the requested logon type at this computer."
> 
> Our IT administrator people are implementing a new security baseline on 
> the machines and seems that condor-reuse-slotn has no rights to run the 
> job anymore. There are some rights that are not being granted for local 
> accounts.
> Does one have any suggestion to fix it?
> 
> Regards, Klaus
> 
> This message is intended solely for the use of its addressee and may 
> contain privileged or confidential information. All information contained 
> herein shall be treated as confidential and shall not be disclosed to any 
> third party without Embraer?s prior written approval. If you are not the 
> addressee you should not distribute, copy or file this message. In this 
> case, please notify the sender and destroy its contents immediately.
> Esta mensagem é para uso exclusivo de seu destinatário e pode conter 
> informações privilegiadas e confidenciais. Todas as informações aqui 
> contidas devem ser tratadas como confidenciais e não devem ser divulgadas 
> a terceiros sem o prévio consentimento por escrito da Embraer. Se você não 
> é o destinatário não deve distribuir, copiar ou arquivar a mensagem. Neste 
> caso, por favor, notifique o remetente da mesma e destrua imediatamente a 
> mensagem._______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/