[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Shadow exception errors



The shadowlog of the submitter is also giving these errors:

2/14 12:28:10 ******************************************************
2/14 12:28:10 ** condor_shadow (CONDOR_SHADOW) STARTING UP
2/14 12:28:10 ** C:\Condor\bin\condor_shadow.exe
2/14 12:28:10 ** $CondorVersion: 6.6.10 Jun 22 2005 $
2/14 12:28:10 ** $CondorPlatform: INTEL-WINNT50 $
2/14 12:28:10 ** PID = 3060
2/14 12:28:10 ******************************************************
2/14 12:28:11 Using config file: c:\condor\condor_config
2/14 12:28:11 Using local config files: C:\Condor/condor_config.local
2/14 12:28:11 DaemonCore: Command Socket at <130.155.67.83:9434>
2/14 12:28:12 Initializing a VANILLA shadow
2/14 12:28:12 (2.0) (3060): init_user_ids: LogonUser failed with NT
Status 1326
2/14 12:28:12 (2.0) (3060): init_user_ids() failed!
2/14 12:28:12 (2.0) (3060): init_user_ids: LogonUser failed with NT
Status 1326
2/14 12:28:12 (2.0) (3060): init_user_ids() failed!
2/14 12:28:12 (2.0) (3060): ERROR "set_user_priv() failed!" at line 400
in file ..\src\condor_c++_util\uids.C

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of 
> Hitchen, Greg (E&M, Kensington)
> Sent: Monday, 13 February 2006 2:02 PM
> To: condor-users@xxxxxxxxxxx
> Subject: [Condor-users] Shadow exception errors
> 
> 
> 
> Hi 
> 
> We have been setting up and experimenting with condor for a 
> while and now have some "real" users onboard using the system.
> 
> This user has submitted a number of jobs that keep trying to 
> start, fail and start again. There are shadow execption 
> problems and eviction problems. Just concentrating on the 
> shadow exception problems for now I have including logs from 
> the submitting machine and from 2 different execute machines. 
> 
> What problem is likely to cause these type of error messages?
> 
> The first example involves flocking to a different pool at a 
> different site. The second involves a jobs in the same pool, 
> but machines still at a physically different site. In both 
> cases hardware firewalls (PIX's) site between but we have set 
> highport, lowport in the configs and enabled tcp/udp for the 
> 9000-10000 port range.
> 
> Thanks.
> 
> Cheers
> 
> Greg
> 
> SHADOW LOG OF SUBMITTING MACHINE
> 
> 2/13 10:54:09 ******************************************************
> 2/13 10:54:09 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> 2/13 10:54:09 ** C:\Condor\bin\condor_shadow.exe
> 2/13 10:54:09 ** $CondorVersion: 6.6.10 Jun 22 2005 $
> 2/13 10:54:09 ** $CondorPlatform: INTEL-WINNT50 $
> 2/13 10:54:09 ** PID = 1268
> 2/13 10:54:09 ******************************************************
> 2/13 10:54:09 Using config file: c:\condor\condor_config
> 2/13 10:54:09 Using local config files: 
> C:\Condor/condor_config.local 2/13 10:54:09 DaemonCore: 
> Command Socket at <130.155.67.83:9091> 2/13 10:54:32 
> Initializing a VANILLA shadow 2/13 10:54:32 (7.0) (1268): 
> Request to run on <130.116.147.52:9590> was ACCEPTED 2/13 
> 10:54:45 (7.0) (1268): ReliSock: put_file: Failed to open 
> file C:\Documents and 
> Settings\odw010\.condorqueue\D78aUAA.egs, errno = 2. 2/13 
> 10:54:45 (7.0) (1268): ERROR "DoUpload: Failed to send file 
> C:\Documents and Settings\odw010\.condorqueue\D78aUAA.egs, 
> exiting at 1398 " at line 1397 in file 
> ..\src\condor_c++_util\file_transfer.C
> 2/13 10:54:46 ******************************************************
> 2/13 10:54:46 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> 2/13 10:54:46 ** C:\Condor\bin\condor_shadow.exe
> 2/13 10:54:46 ** $CondorVersion: 6.6.10 Jun 22 2005 $
> 2/13 10:54:46 ** $CondorPlatform: INTEL-WINNT50 $
> 2/13 10:54:46 ** PID = 2676
> 2/13 10:54:46 ******************************************************
> 2/13 10:54:47 Using config file: c:\condor\condor_config
> 2/13 10:54:47 Using local config files: 
> C:\Condor/condor_config.local 2/13 10:54:47 DaemonCore: 
> Command Socket at <130.155.67.83:9741> 2/13 10:55:09 
> Initializing a VANILLA shadow 2/13 10:55:09 (7.0) (2676): 
> Request to run on <130.116.147.52:9590> was ACCEPTED 2/13 
> 10:55:14 (7.0) (2676): ReliSock: put_file: Failed to open 
> file C:\Documents and 
> Settings\odw010\.condorqueue\D78aUAA.egs, errno = 2. 2/13 
> 10:55:14 (7.0) (2676): ERROR "DoUpload: Failed to send file 
> C:\Documents and Settings\odw010\.condorqueue\D78aUAA.egs, 
> exiting at 1398 " at line 1397 in file 
> ..\src\condor_c++_util\file_transfer.C
> 2/13 11:07:43 (5.0) (1076): Job 5.0 is being evicted
> 2/13 11:07:43 (5.0) (1076): **** condor_shadow 
> (condor_SHADOW) EXITING WITH STATUS 107
> 
> STARTER LOG OF EXECUTE MACHINE
> 
> 2/13 06:40:56 ******************************************************
> 2/13 06:40:56 ** condor_starter (CONDOR_STARTER) STARTING UP 
> 2/13 06:40:56 ** C:\Condor\bin\condor_starter.exe 2/13 
> 06:40:56 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/13 
> 06:40:56 ** $CondorPlatform: INTEL-WINNT50 $ 2/13 06:40:56 ** 
> PID = 4048 2/13 06:40:56 
> ******************************************************
> 2/13 06:40:56 Using config file: c:\condor\condor_config
> 2/13 06:40:56 Using local config files: 
> C:\Condor/condor_config.local 2/13 06:40:56 DaemonCore: 
> Command Socket at <130.116.147.52:9448> 2/13 06:40:56 Setting 
> resource limits not implemented! 2/13 06:41:15 Starter 
> communicating with condor_shadow <130.155.67.83:9691> 2/13 
> 06:41:15 Submitting machine is 
> "student3-lu.minerals.csiro.au" 2/13 06:41:33 File transfer 
> completed successfully. 2/13 06:41:33 Starting a VANILLA 
> universe job with ID: 3.0 2/13 06:41:33 IWD: 
> C:\Condor/execute\dir_4048 2/13 06:41:33 Output file: 
> C:\Condor/execute\dir_4048\D7EG9AB.log
> 2/13 06:41:34 Renice expr "10" evaluated to 10
> 2/13 06:41:34 About to exec C:\Condor\execute\dir_4048\condor_exec.exe
> D7EG9AB.egs
> 2/13 06:41:34 Create_Process succeeded, pid=2932
> 2/13 07:10:28 Got SIGQUIT.  Performing fast shutdown.
> 2/13 07:10:28 ShutdownFast all jobs.
> 2/13 07:10:28 Process exited, pid=2932, status=0
> 2/13 07:10:28 Last process exited, now Starter is exiting
> 2/13 07:10:28 **** condor_starter (condor_STARTER) EXITING 
> WITH STATUS 0 2/13 07:38:11 
> ******************************************************
> 2/13 07:38:11 ** condor_starter (CONDOR_STARTER) STARTING UP 
> 2/13 07:38:11 ** C:\Condor\bin\condor_starter.exe 2/13 
> 07:38:11 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/13 
> 07:38:11 ** $CondorPlatform: INTEL-WINNT50 $ 2/13 07:38:11 ** 
> PID = 3688 2/13 07:38:11 
> ******************************************************
> 2/13 07:38:11 Using config file: c:\condor\condor_config
> 2/13 07:38:11 Using local config files: 
> C:\Condor/condor_config.local 2/13 07:38:11 DaemonCore: 
> Command Socket at <130.116.147.52:9413> 2/13 07:38:11 Setting 
> resource limits not implemented! 2/13 07:38:11 Starter 
> communicating with condor_shadow <130.155.67.83:9541> 2/13 
> 07:38:11 Submitting machine is 
> "student3-lu.minerals.csiro.au" 2/13 07:38:29 File transfer 
> completed successfully. 2/13 07:38:29 Starting a VANILLA 
> universe job with ID: 7.0 2/13 07:38:29 IWD: 
> C:\Condor/execute\dir_3688 2/13 07:38:29 Output file: 
> C:\Condor/execute\dir_3688\D78aUAA.log
> 2/13 07:38:29 Renice expr "10" evaluated to 10
> 2/13 07:38:29 About to exec C:\Condor\execute\dir_3688\condor_exec.exe
> D78aUAA.egs
> 2/13 07:38:29 Create_Process succeeded, pid=2716
> 2/13 07:44:09 Process exited, pid=2716, status=0
> 2/13 07:44:10 ReliSock: put_file: Failed to open file 
> C:\Condor/execute\dir_3688\D78aUAA.condorlog, errno = 2. 2/13 
> 07:44:10 ERROR "DoUpload: Failed to send file 
> C:\Condor/execute\dir_3688\D78aUAA.condorlog, exiting at 1398 
> " at line 1397 in file ..\src\condor_c++_util\file_transfer.C
> 2/13 07:44:10 ShutdownFast all jobs.
> 2/13 07:44:10 Error disabling account condor-reuse-vm1 (ACCESS DENIED)
> 
> 
> SHADOW LOG OF SUBMITTING MACHINE
> 
> 2/12 16:55:49 ******************************************************
> 2/12 16:55:49 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> 2/12 16:55:49 ** C:\Condor\bin\condor_shadow.exe
> 2/12 16:55:49 ** $CondorVersion: 6.6.10 Jun 22 2005 $
> 2/12 16:55:49 ** $CondorPlatform: INTEL-WINNT50 $
> 2/12 16:55:49 ** PID = 1068
> 2/12 16:55:49 ******************************************************
> 2/12 16:55:49 Using config file: c:\condor\condor_config
> 2/12 16:55:49 Using local config files: 
> C:\Condor/condor_config.local 2/12 16:55:50 DaemonCore: 
> Command Socket at <130.155.67.83:9698> 2/12 16:56:12 
> Initializing a VANILLA shadow 2/12 16:56:12 (5.0) (1068): 
> Request to run on <138.194.10.81:9018> was ACCEPTED 2/12 
> 16:56:40 (5.0) (1068): condor_read(): recv() returned -1, 
> errno = 10054, assuming failure. 2/12 16:56:40 (5.0) (1068): 
> condor_read(): recv() returned -1, errno = 10054, assuming 
> failure. 2/12 16:56:41 (5.0) (1068): ERROR "Can no longer 
> talk to condor_starter on execute machine (138.194.10.81)" at 
> line 63 in file ..\src\condor_shadow.V6.1\NTreceivers.C
> 2/12 16:56:42 ******************************************************
> 2/12 16:56:42 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> 2/12 16:56:42 ** C:\Condor\bin\condor_shadow.exe
> 2/12 16:56:42 ** $CondorVersion: 6.6.10 Jun 22 2005 $
> 2/12 16:56:42 ** $CondorPlatform: INTEL-WINNT50 $
> 2/12 16:56:42 ** PID = 492
> 2/12 16:56:42 ******************************************************
> 2/12 16:56:42 Using config file: c:\condor\condor_config
> 2/12 16:56:42 Using local config files: 
> C:\Condor/condor_config.local 2/12 16:56:42 DaemonCore: 
> Command Socket at <130.155.67.83:9289> 2/12 16:57:04 
> Initializing a VANILLA shadow 2/12 16:57:04 (5.0) (492): 
> Request to run on <138.194.10.81:9018> was ACCEPTED 2/12 
> 16:57:12 (5.0) (492): condor_read(): recv() returned -1, 
> errno = 10054, assuming failure. 2/12 16:57:12 (5.0) (492): 
> condor_read(): recv() returned -1, errno = 10054, assuming 
> failure. 2/12 16:57:12 (5.0) (492): ERROR "Can no longer talk 
> to condor_starter on execute machine (138.194.10.81)" at line 
> 63 in file ..\src\condor_shadow.V6.1\NTreceivers.C
> 
> STARTER LOG OF EXECUTING MACHINE
> 
> 2/10 23:44:22 ******************************************************
> 2/10 23:44:22 ** condor_starter (CONDOR_STARTER) STARTING UP 
> 2/10 23:44:22 ** C:\Condor\bin\condor_starter.exe 2/10 
> 23:44:22 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/10 
> 23:44:22 ** $CondorPlatform: INTEL-WINNT50 $ 2/10 23:44:22 ** 
> PID = 3508 2/10 23:44:22 
> ******************************************************
> 2/10 23:44:22 Using config file: C:\Condor\condor_config
> 2/10 23:44:22 Using local config files: 
> C:\Condor/condor_config.local 2/10 23:44:22 DaemonCore: 
> Command Socket at <138.194.10.81:9790> 2/10 23:44:22 Setting 
> resource limits not implemented! 2/10 23:44:41 Starter 
> communicating with condor_shadow <130.155.67.83:9344> 2/10 
> 23:44:41 Submitting machine is 
> "student3-lu.minerals.CSIRO.AU" 2/10 23:44:47 File transfer 
> completed successfully. 2/10 23:44:47 Starting a VANILLA 
> universe job with ID: 4.0 2/10 23:44:47 IWD: 
> C:\Condor/execute\dir_3508 2/10 23:44:47 Output file: 
> C:\Condor/execute\dir_3508\D7EG9AC.log
> 2/10 23:44:47 Renice expr "10" evaluated to 10
> 2/10 23:44:47 About to exec C:\Condor\execute\dir_3508\condor_exec.exe
> D7EG9AC.egs
> 2/10 23:44:47 Create_Process succeeded, pid=3860
> 2/10 23:45:08 Process exited, pid=3860, status=-1
> 2/10 23:45:09 ReliSock: put_file: Failed to open file 
> C:\Condor/execute\dir_3508\D7EG9AC.condorlog, errno = 2. 2/10 
> 23:45:09 ERROR "DoUpload: Failed to send file 
> C:\Condor/execute\dir_3508\D7EG9AC.condorlog, exiting at 1398 
> " at line 1397 in file ..\src\condor_c++_util\file_transfer.C
> 2/10 23:45:09 ShutdownFast all jobs.
> 2/10 23:45:09 Error disabling account condor-reuse-vm1 
> (ACCESS DENIED) 2/10 23:45:32 
> ******************************************************
> 2/10 23:45:32 ** condor_starter (CONDOR_STARTER) STARTING UP 
> 2/10 23:45:32 ** C:\Condor\bin\condor_starter.exe 2/10 
> 23:45:32 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/10 
> 23:45:32 ** $CondorPlatform: INTEL-WINNT50 $ 2/10 23:45:32 ** 
> PID = 3624 2/10 23:45:32 
> ******************************************************
> 2/10 23:45:32 Using config file: C:\Condor\condor_config
> 2/10 23:45:32 Using local config files: 
> C:\Condor/condor_config.local 2/10 23:45:32 DaemonCore: 
> Command Socket at <138.194.10.81:9438> 2/10 23:45:32 Setting 
> resource limits not implemented! 2/10 23:45:33 Starter 
> communicating with condor_shadow <130.155.67.83:9216> 2/10 
> 23:45:33 Submitting machine is 
> "student3-lu.minerals.CSIRO.AU" 2/10 23:45:39 File transfer 
> completed successfully. 2/10 23:45:39 Starting a VANILLA 
> universe job with ID: 4.0 2/10 23:45:39 IWD: 
> C:\Condor/execute\dir_3624 2/10 23:45:39 Output file: 
> C:\Condor/execute\dir_3624\D7EG9AC.log
> 2/10 23:45:39 Renice expr "10" evaluated to 10
> 2/10 23:45:39 About to exec C:\Condor\execute\dir_3624\condor_exec.exe
> D7EG9AC.egs
> 2/10 23:45:39 Create_Process succeeded, pid=4092
> 2/10 23:45:39 Process exited, pid=4092, status=-1
> 2/10 23:45:40 ReliSock: put_file: Failed to open file 
> C:\Condor/execute\dir_3624\D7EG9AC.condorlog, errno = 2. 2/10 
> 23:45:40 ERROR "DoUpload: Failed to send file 
> C:\Condor/execute\dir_3624\D7EG9AC.condorlog, exiting at 1398 
> " at line 1397 in file ..\src\condor_c++_util\file_transfer.C
> 2/10 23:45:40 ShutdownFast all jobs.
> 2/10 23:45:40 Error disabling account condor-reuse-vm1 (ACCESS DENIED)
> 
> --------------------------------------------------------------
> ---------
> Greg Hitchen
> greg.hitchen@xxxxxxxx
> CSIRO Exploration and Mining				phone:+61 8 6436
> 8663
> Australian Resources Research Centre (ARRC)	fax:	+61 8 6436 8555
> Postal address:						
> mob:	0407 952
> 748
> PO Box 1130, Bentley WA 6102, Australia
> Street Address:
> 26 Dick Perry Avenue, Kensington WA 6151
> --------------------------------------------------------------
> ---------
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx 
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>