[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Shadow exception errors



OK, this one is fixed. The user had changed their password and
needed to run the condor_store_cred command again.

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of 
> Hitchen, Greg (E&M, Kensington)
> Sent: Tuesday, 14 February 2006 9:34 AM
> To: condor-users@xxxxxxxxxxx
> Subject: Re: [Condor-users] Shadow exception errors
> 
> 
> 
> The shadowlog of the submitter is also giving these errors:
> 
> 2/14 12:28:10 ******************************************************
> 2/14 12:28:10 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> 2/14 12:28:10 ** C:\Condor\bin\condor_shadow.exe
> 2/14 12:28:10 ** $CondorVersion: 6.6.10 Jun 22 2005 $
> 2/14 12:28:10 ** $CondorPlatform: INTEL-WINNT50 $
> 2/14 12:28:10 ** PID = 3060
> 2/14 12:28:10 ******************************************************
> 2/14 12:28:11 Using config file: c:\condor\condor_config
> 2/14 12:28:11 Using local config files: 
> C:\Condor/condor_config.local 2/14 12:28:11 DaemonCore: 
> Command Socket at <130.155.67.83:9434> 2/14 12:28:12 
> Initializing a VANILLA shadow 2/14 12:28:12 (2.0) (3060): 
> init_user_ids: LogonUser failed with NT Status 1326 2/14 
> 12:28:12 (2.0) (3060): init_user_ids() failed! 2/14 12:28:12 
> (2.0) (3060): init_user_ids: LogonUser failed with NT Status 
> 1326 2/14 12:28:12 (2.0) (3060): init_user_ids() failed! 2/14 
> 12:28:12 (2.0) (3060): ERROR "set_user_priv() failed!" at 
> line 400 in file ..\src\condor_c++_util\uids.C
> 
> > -----Original Message-----
> > From: condor-users-bounces@xxxxxxxxxxx
> > [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of 
> > Hitchen, Greg (E&M, Kensington)
> > Sent: Monday, 13 February 2006 2:02 PM
> > To: condor-users@xxxxxxxxxxx
> > Subject: [Condor-users] Shadow exception errors
> > 
> > 
> > 
> > Hi
> > 
> > We have been setting up and experimenting with condor for a
> > while and now have some "real" users onboard using the system.
> > 
> > This user has submitted a number of jobs that keep trying to
> > start, fail and start again. There are shadow execption 
> > problems and eviction problems. Just concentrating on the 
> > shadow exception problems for now I have including logs from 
> > the submitting machine and from 2 different execute machines. 
> > 
> > What problem is likely to cause these type of error messages?
> > 
> > The first example involves flocking to a different pool at a
> > different site. The second involves a jobs in the same pool, 
> > but machines still at a physically different site. In both 
> > cases hardware firewalls (PIX's) site between but we have set 
> > highport, lowport in the configs and enabled tcp/udp for the 
> > 9000-10000 port range.
> > 
> > Thanks.
> > 
> > Cheers
> > 
> > Greg
> > 
> > SHADOW LOG OF SUBMITTING MACHINE
> > 
> > 2/13 10:54:09 ******************************************************
> > 2/13 10:54:09 ** condor_shadow (CONDOR_SHADOW) STARTING UP 2/13 
> > 10:54:09 ** C:\Condor\bin\condor_shadow.exe 2/13 10:54:09 ** 
> > $CondorVersion: 6.6.10 Jun 22 2005 $ 2/13 10:54:09 ** 
> $CondorPlatform: 
> > INTEL-WINNT50 $ 2/13 10:54:09 ** PID = 1268
> > 2/13 10:54:09 ******************************************************
> > 2/13 10:54:09 Using config file: c:\condor\condor_config
> > 2/13 10:54:09 Using local config files: 
> > C:\Condor/condor_config.local 2/13 10:54:09 DaemonCore: 
> > Command Socket at <130.155.67.83:9091> 2/13 10:54:32 
> > Initializing a VANILLA shadow 2/13 10:54:32 (7.0) (1268): 
> > Request to run on <130.116.147.52:9590> was ACCEPTED 2/13 
> > 10:54:45 (7.0) (1268): ReliSock: put_file: Failed to open 
> > file C:\Documents and 
> > Settings\odw010\.condorqueue\D78aUAA.egs, errno = 2. 2/13 
> > 10:54:45 (7.0) (1268): ERROR "DoUpload: Failed to send file 
> > C:\Documents and Settings\odw010\.condorqueue\D78aUAA.egs, 
> > exiting at 1398 " at line 1397 in file 
> > ..\src\condor_c++_util\file_transfer.C
> > 2/13 10:54:46 ******************************************************
> > 2/13 10:54:46 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> > 2/13 10:54:46 ** C:\Condor\bin\condor_shadow.exe
> > 2/13 10:54:46 ** $CondorVersion: 6.6.10 Jun 22 2005 $
> > 2/13 10:54:46 ** $CondorPlatform: INTEL-WINNT50 $
> > 2/13 10:54:46 ** PID = 2676
> > 2/13 10:54:46 ******************************************************
> > 2/13 10:54:47 Using config file: c:\condor\condor_config
> > 2/13 10:54:47 Using local config files: 
> > C:\Condor/condor_config.local 2/13 10:54:47 DaemonCore: 
> > Command Socket at <130.155.67.83:9741> 2/13 10:55:09 
> > Initializing a VANILLA shadow 2/13 10:55:09 (7.0) (2676): 
> > Request to run on <130.116.147.52:9590> was ACCEPTED 2/13 
> > 10:55:14 (7.0) (2676): ReliSock: put_file: Failed to open 
> > file C:\Documents and 
> > Settings\odw010\.condorqueue\D78aUAA.egs, errno = 2. 2/13 
> > 10:55:14 (7.0) (2676): ERROR "DoUpload: Failed to send file 
> > C:\Documents and Settings\odw010\.condorqueue\D78aUAA.egs, 
> > exiting at 1398 " at line 1397 in file 
> > ..\src\condor_c++_util\file_transfer.C
> > 2/13 11:07:43 (5.0) (1076): Job 5.0 is being evicted
> > 2/13 11:07:43 (5.0) (1076): **** condor_shadow 
> > (condor_SHADOW) EXITING WITH STATUS 107
> > 
> > STARTER LOG OF EXECUTE MACHINE
> > 
> > 2/13 06:40:56 ******************************************************
> > 2/13 06:40:56 ** condor_starter (CONDOR_STARTER) STARTING UP
> > 2/13 06:40:56 ** C:\Condor\bin\condor_starter.exe 2/13 
> > 06:40:56 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/13 
> > 06:40:56 ** $CondorPlatform: INTEL-WINNT50 $ 2/13 06:40:56 ** 
> > PID = 4048 2/13 06:40:56 
> > ******************************************************
> > 2/13 06:40:56 Using config file: c:\condor\condor_config
> > 2/13 06:40:56 Using local config files: 
> > C:\Condor/condor_config.local 2/13 06:40:56 DaemonCore: 
> > Command Socket at <130.116.147.52:9448> 2/13 06:40:56 Setting 
> > resource limits not implemented! 2/13 06:41:15 Starter 
> > communicating with condor_shadow <130.155.67.83:9691> 2/13 
> > 06:41:15 Submitting machine is 
> > "student3-lu.minerals.csiro.au" 2/13 06:41:33 File transfer 
> > completed successfully. 2/13 06:41:33 Starting a VANILLA 
> > universe job with ID: 3.0 2/13 06:41:33 IWD: 
> > C:\Condor/execute\dir_4048 2/13 06:41:33 Output file: 
> > C:\Condor/execute\dir_4048\D7EG9AB.log
> > 2/13 06:41:34 Renice expr "10" evaluated to 10
> > 2/13 06:41:34 About to exec 
> C:\Condor\execute\dir_4048\condor_exec.exe
> > D7EG9AB.egs
> > 2/13 06:41:34 Create_Process succeeded, pid=2932
> > 2/13 07:10:28 Got SIGQUIT.  Performing fast shutdown.
> > 2/13 07:10:28 ShutdownFast all jobs.
> > 2/13 07:10:28 Process exited, pid=2932, status=0
> > 2/13 07:10:28 Last process exited, now Starter is exiting
> > 2/13 07:10:28 **** condor_starter (condor_STARTER) EXITING 
> > WITH STATUS 0 2/13 07:38:11 
> > ******************************************************
> > 2/13 07:38:11 ** condor_starter (CONDOR_STARTER) STARTING UP 
> > 2/13 07:38:11 ** C:\Condor\bin\condor_starter.exe 2/13 
> > 07:38:11 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/13 
> > 07:38:11 ** $CondorPlatform: INTEL-WINNT50 $ 2/13 07:38:11 ** 
> > PID = 3688 2/13 07:38:11 
> > ******************************************************
> > 2/13 07:38:11 Using config file: c:\condor\condor_config
> > 2/13 07:38:11 Using local config files: 
> > C:\Condor/condor_config.local 2/13 07:38:11 DaemonCore: 
> > Command Socket at <130.116.147.52:9413> 2/13 07:38:11 Setting 
> > resource limits not implemented! 2/13 07:38:11 Starter 
> > communicating with condor_shadow <130.155.67.83:9541> 2/13 
> > 07:38:11 Submitting machine is 
> > "student3-lu.minerals.csiro.au" 2/13 07:38:29 File transfer 
> > completed successfully. 2/13 07:38:29 Starting a VANILLA 
> > universe job with ID: 7.0 2/13 07:38:29 IWD: 
> > C:\Condor/execute\dir_3688 2/13 07:38:29 Output file: 
> > C:\Condor/execute\dir_3688\D78aUAA.log
> > 2/13 07:38:29 Renice expr "10" evaluated to 10
> > 2/13 07:38:29 About to exec 
> C:\Condor\execute\dir_3688\condor_exec.exe
> > D78aUAA.egs
> > 2/13 07:38:29 Create_Process succeeded, pid=2716
> > 2/13 07:44:09 Process exited, pid=2716, status=0
> > 2/13 07:44:10 ReliSock: put_file: Failed to open file 
> > C:\Condor/execute\dir_3688\D78aUAA.condorlog, errno = 2. 2/13 
> > 07:44:10 ERROR "DoUpload: Failed to send file 
> > C:\Condor/execute\dir_3688\D78aUAA.condorlog, exiting at 1398 
> > " at line 1397 in file ..\src\condor_c++_util\file_transfer.C
> > 2/13 07:44:10 ShutdownFast all jobs.
> > 2/13 07:44:10 Error disabling account condor-reuse-vm1 
> (ACCESS DENIED)
> > 
> > 
> > SHADOW LOG OF SUBMITTING MACHINE
> > 
> > 2/12 16:55:49 ******************************************************
> > 2/12 16:55:49 ** condor_shadow (CONDOR_SHADOW) STARTING UP 2/12 
> > 16:55:49 ** C:\Condor\bin\condor_shadow.exe 2/12 16:55:49 ** 
> > $CondorVersion: 6.6.10 Jun 22 2005 $ 2/12 16:55:49 ** 
> $CondorPlatform: 
> > INTEL-WINNT50 $ 2/12 16:55:49 ** PID = 1068
> > 2/12 16:55:49 ******************************************************
> > 2/12 16:55:49 Using config file: c:\condor\condor_config
> > 2/12 16:55:49 Using local config files: 
> > C:\Condor/condor_config.local 2/12 16:55:50 DaemonCore: 
> > Command Socket at <130.155.67.83:9698> 2/12 16:56:12 
> > Initializing a VANILLA shadow 2/12 16:56:12 (5.0) (1068): 
> > Request to run on <138.194.10.81:9018> was ACCEPTED 2/12 
> > 16:56:40 (5.0) (1068): condor_read(): recv() returned -1, 
> > errno = 10054, assuming failure. 2/12 16:56:40 (5.0) (1068): 
> > condor_read(): recv() returned -1, errno = 10054, assuming 
> > failure. 2/12 16:56:41 (5.0) (1068): ERROR "Can no longer 
> > talk to condor_starter on execute machine (138.194.10.81)" at 
> > line 63 in file ..\src\condor_shadow.V6.1\NTreceivers.C
> > 2/12 16:56:42 ******************************************************
> > 2/12 16:56:42 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> > 2/12 16:56:42 ** C:\Condor\bin\condor_shadow.exe
> > 2/12 16:56:42 ** $CondorVersion: 6.6.10 Jun 22 2005 $
> > 2/12 16:56:42 ** $CondorPlatform: INTEL-WINNT50 $
> > 2/12 16:56:42 ** PID = 492
> > 2/12 16:56:42 ******************************************************
> > 2/12 16:56:42 Using config file: c:\condor\condor_config
> > 2/12 16:56:42 Using local config files: 
> > C:\Condor/condor_config.local 2/12 16:56:42 DaemonCore: 
> > Command Socket at <130.155.67.83:9289> 2/12 16:57:04 
> > Initializing a VANILLA shadow 2/12 16:57:04 (5.0) (492): 
> > Request to run on <138.194.10.81:9018> was ACCEPTED 2/12 
> > 16:57:12 (5.0) (492): condor_read(): recv() returned -1, 
> > errno = 10054, assuming failure. 2/12 16:57:12 (5.0) (492): 
> > condor_read(): recv() returned -1, errno = 10054, assuming 
> > failure. 2/12 16:57:12 (5.0) (492): ERROR "Can no longer talk 
> > to condor_starter on execute machine (138.194.10.81)" at line 
> > 63 in file ..\src\condor_shadow.V6.1\NTreceivers.C
> > 
> > STARTER LOG OF EXECUTING MACHINE
> > 
> > 2/10 23:44:22 ******************************************************
> > 2/10 23:44:22 ** condor_starter (CONDOR_STARTER) STARTING UP
> > 2/10 23:44:22 ** C:\Condor\bin\condor_starter.exe 2/10 
> > 23:44:22 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/10 
> > 23:44:22 ** $CondorPlatform: INTEL-WINNT50 $ 2/10 23:44:22 ** 
> > PID = 3508 2/10 23:44:22 
> > ******************************************************
> > 2/10 23:44:22 Using config file: C:\Condor\condor_config
> > 2/10 23:44:22 Using local config files: 
> > C:\Condor/condor_config.local 2/10 23:44:22 DaemonCore: 
> > Command Socket at <138.194.10.81:9790> 2/10 23:44:22 Setting 
> > resource limits not implemented! 2/10 23:44:41 Starter 
> > communicating with condor_shadow <130.155.67.83:9344> 2/10 
> > 23:44:41 Submitting machine is 
> > "student3-lu.minerals.CSIRO.AU" 2/10 23:44:47 File transfer 
> > completed successfully. 2/10 23:44:47 Starting a VANILLA 
> > universe job with ID: 4.0 2/10 23:44:47 IWD: 
> > C:\Condor/execute\dir_3508 2/10 23:44:47 Output file: 
> > C:\Condor/execute\dir_3508\D7EG9AC.log
> > 2/10 23:44:47 Renice expr "10" evaluated to 10
> > 2/10 23:44:47 About to exec 
> C:\Condor\execute\dir_3508\condor_exec.exe
> > D7EG9AC.egs
> > 2/10 23:44:47 Create_Process succeeded, pid=3860
> > 2/10 23:45:08 Process exited, pid=3860, status=-1
> > 2/10 23:45:09 ReliSock: put_file: Failed to open file 
> > C:\Condor/execute\dir_3508\D7EG9AC.condorlog, errno = 2. 2/10 
> > 23:45:09 ERROR "DoUpload: Failed to send file 
> > C:\Condor/execute\dir_3508\D7EG9AC.condorlog, exiting at 1398 
> > " at line 1397 in file ..\src\condor_c++_util\file_transfer.C
> > 2/10 23:45:09 ShutdownFast all jobs.
> > 2/10 23:45:09 Error disabling account condor-reuse-vm1 
> > (ACCESS DENIED) 2/10 23:45:32 
> > ******************************************************
> > 2/10 23:45:32 ** condor_starter (CONDOR_STARTER) STARTING UP 
> > 2/10 23:45:32 ** C:\Condor\bin\condor_starter.exe 2/10 
> > 23:45:32 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/10 
> > 23:45:32 ** $CondorPlatform: INTEL-WINNT50 $ 2/10 23:45:32 ** 
> > PID = 3624 2/10 23:45:32 
> > ******************************************************
> > 2/10 23:45:32 Using config file: C:\Condor\condor_config
> > 2/10 23:45:32 Using local config files: 
> > C:\Condor/condor_config.local 2/10 23:45:32 DaemonCore: 
> > Command Socket at <138.194.10.81:9438> 2/10 23:45:32 Setting 
> > resource limits not implemented! 2/10 23:45:33 Starter 
> > communicating with condor_shadow <130.155.67.83:9216> 2/10 
> > 23:45:33 Submitting machine is 
> > "student3-lu.minerals.CSIRO.AU" 2/10 23:45:39 File transfer 
> > completed successfully. 2/10 23:45:39 Starting a VANILLA 
> > universe job with ID: 4.0 2/10 23:45:39 IWD: 
> > C:\Condor/execute\dir_3624 2/10 23:45:39 Output file: 
> > C:\Condor/execute\dir_3624\D7EG9AC.log
> > 2/10 23:45:39 Renice expr "10" evaluated to 10
> > 2/10 23:45:39 About to exec 
> C:\Condor\execute\dir_3624\condor_exec.exe
> > D7EG9AC.egs
> > 2/10 23:45:39 Create_Process succeeded, pid=4092
> > 2/10 23:45:39 Process exited, pid=4092, status=-1
> > 2/10 23:45:40 ReliSock: put_file: Failed to open file 
> > C:\Condor/execute\dir_3624\D7EG9AC.condorlog, errno = 2. 2/10 
> > 23:45:40 ERROR "DoUpload: Failed to send file 
> > C:\Condor/execute\dir_3624\D7EG9AC.condorlog, exiting at 1398 
> > " at line 1397 in file ..\src\condor_c++_util\file_transfer.C
> > 2/10 23:45:40 ShutdownFast all jobs.
> > 2/10 23:45:40 Error disabling account condor-reuse-vm1 
> (ACCESS DENIED)
> > 
> > --------------------------------------------------------------
> > ---------
> > Greg Hitchen
> > greg.hitchen@xxxxxxxx
> > CSIRO Exploration and Mining				
> phone:+61 8 6436
> > 8663
> > Australian Resources Research Centre (ARRC)	fax:	+61 8 6436 8555
> > Postal address:						
> > mob:	0407 952
> > 748
> > PO Box 1130, Bentley WA 6102, Australia
> > Street Address:
> > 26 Dick Perry Avenue, Kensington WA 6151
> > --------------------------------------------------------------
> > ---------
> > 
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>