[Condor-users] Slave machine produces no output; return value 203

I am attempting to render a Maya scene file. I have 3 physical computers and 12 virtual machines in my pool. The physical attributes of the machines are identical. I'm using condor_render.exe to produce and submit the jobs to condor. If I render more than 4 frames, some of the rendered images do not show up. For example, if I render 45 frames, only about 20 images show up. I have narrowed the problem down to the jobs rendered on the slave computers. These jobs return a value of 203 as indicated in the log files below. Jobs rendered on the master return a value of 0.

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

...
005 (002.003.000) 05/07 05:27:16 Job terminated.
(1) Normal termination (return value 203)
  Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
  Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
33 - Run Bytes Sent By Job
63428 - Run Bytes Received By Job
33 - Total Bytes Sent By Job
63428 - Total Bytes Received By Job
...
005 (002.000.000) 05/07 05:27:17 Job terminated.
(1) Normal termination (return value 0)
  Usr 0 00:00:01, Sys 0 00:00:00 - Run Remote Usage
  Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
  Usr 0 00:00:01, Sys 0 00:00:00 - Total Remote Usage
  Us! r 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
20594 - Run Bytes Sent By Job
63428 - Run Bytes Received By Job
20594 - Total Bytes Sent By Job
63428 - Total Bytes Received By Job
...

5/7 05:27:14 ******************************************************
5/7 05:27:14 ** condor_starter (CONDOR_STARTER) STARTING UP
5/7 05:27:14 ** C:\Condor\bin\condor_starter.exe
5/7 05:27:14 ** $CondorVersion: 6.6.9 Mar 10 2005 $
5/7 05:27:14 ** $CondorPlatform: INTEL-WINNT40 $
5/7 05:27:14 ** PID = 1156
5/7 05:27:14 ******************************************************
5/7 05:27:14 Using config file: C:\Condor\condor_config
5/7 05:27:14 Using local config files: C:\Condor/condor_config.local
5/7 05:27:14 DaemonCore: Command Socket at <10.100.4.8:4247>
5/7 05:27:14 Setting resource limits not implemented!
5/7 05:27:14 Starter communicating with condor_shadow <10.100.4.8:4244>
5/7 05:27:14 Submitting machine is "anim2"
5/7 05:27:14 File transfer completed successfully.
5/7 05:27:15 Starting a VANILLA universe job with ID: 2.3
5/7 05:27:15 IWD: C:\Condor/execute\dir_1156
5/7 05:27:15 Output file: C:\Condor/execute\dir_1156\cr.out
5/7 05:27:15 Error file: C:\Condor/execute\dir_1156\cr.err
5/7 05:27:15 Renice expr "10" evaluated to 10
5/7 05:27:15 About to exec C:\WINDOWS\system32\cmd.exe /Q /C condor_exec.bat -rd . -im Frame -s 4.0000 -e 4.0000 -b 1.0000 ~Test1.mb
5/7 05:27:15 Create_Process succeeded, pid=2176
5/7 05:27:15 Process exited, pid=2176, status=203
5/7 05:27:16 Got SIGQUIT. Performing fast shutdown.
5/7 05:27:16 ShutdownFast all jobs.
5/7 05:27:16 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0

5/7 05:27:02 DaemonCore: Command received via UDP from host <10.100.4.8:4217>
5/7 05:27:02 DaemonCore: received command 421 (RESCHEDULE), calling handler (reschedule_negotiator)
5/7 05:27:02 Sent ad to central manager for Anim@xxxxxxxxxx
5/7 05:27:02 Called reschedule_negotiator()
5/7 05:27:02 Activity on stashed negotiator socket
5/7 05:27:02 Negotiating for owner: Anim@xxxxxxxxxx
5/7 05:27:02 Checking consistency running and runnable jobs
5/7 05:27:02 Tables are consistent
5/7 05:27:04 Out of jobs - 5 jobs matched, 0 jobs idle, flock level = 0
5/7 05:27:07 Started shadow for job 2.0 on "<10.100.4.6:4472>", (shadow pid = 2628)
5/7 05:27:07 Sent ad to central manager for Anim@xxxxxxxxxx
5/7 05:27:09 Started shadow for job 2.1 on "<10.100.4.6:4472>", (shadow pid = 2208)
5/7 05:27:11 Started shadow for job 2.2! on "<10.100.4.6:4472>", (shadow pid = 2920)
5/7 05:27:13 Started shadow for job 2.3 on "<10.100.4.8:2484>", (shadow pid = 3816)
5/7 05:27:15 Started shadow for job 2.4 on "<10.100.4.6:4472>", (shadow pid = 4076)
5/7 05:27:15 Sent ad to central manager for Anim@xxxxxxxxxx
5/7 05:27:16 DaemonCore: Command received via UDP from host <10.100.4.8:4264>
5/7 05:27:16 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
5/7 05:27:16 Shadow pid 3816 for job 2.3 exited with status 100
5/7 05:27:16 match (<10.100.4.8:2484>#2627232347) out of jobs (cluster id 2); relinquishing
5/7 05:27:16 Sent RELEASE_CLAIM to startd on <10.100.4.8:2484>
5/7 05:27:16 Match record (<10.100.4.8:2484>, 2, -1) deleted
5/7 05:27:16 DaemonCore: Command received via TCP from host <10.100.4.8:4267>
5/7 05:27:16 DaemonCore: received command 443 (VACATE_! SERVICE), calling handler (vacate_service)
5/7 05:27:16 Got VACATE_SERVICE from <10.100.4.8:4267>
5/7 05:27:17 DaemonCore: Command received via UDP from host <10.100.4.8:4274>
5/7 05:27:17 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
5/7 05:27:17 Shadow pid 2628 for job 2.0 exited with status 100
5/7 05:27:17 match (<10.100.4.6:4472>#2785286360) out of jobs (cluster id 2); relinquishing
5/7 05:27:17 Sent RELEASE_CLAIM to startd on <10.100.4.6:4472>
5/7 05:27:17 Match record (<10.100.4.6:4472>, 2, -1) deleted

Mailing List Archives

Public Access

[Condor-users] Slave machine produces no output; return value 203