[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Request to run on <...:...> was REFUSED



Hi,

I found strange error messages in the logs while inspecting the reason of some mysterious job evictions that (seems to) happen right when the job starts. In the shadow log I found the following:

9/9 15:21:54 Initializing a VANILLA shadow for job 165980.0
9/9 15:21:54 (165980.0) (5720): Request to run on <192.168.0.105:1040> was REFUSED
9/9 15:21:54 (165980.0) (5720): Job 165980.0 is being evicted
9/9 15:21:54 (165980.0) (5720): logEvictEvent with unknown reason (108), aborting 9/9 15:21:54 (165980.0) (5720): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 108
9/9 15:22:09 ******************************************************


And in the starter log I found this:

9/9 15:22:09 ******************************************************
9/9 15:22:09 Using config source: C:\Condor\condor_config
9/9 15:22:09 Using local config sources:
9/9 15:22:09    C:\Condor/condor_config.local
9/9 15:22:09 DaemonCore: Command Socket at <192.168.0.105:1125>
9/9 15:22:09 Setting resource limits not implemented!
9/9 15:22:09 Communicating with shadow <192.168.0.50:4640>
9/9 15:22:09 Submitting machine is "gadget.digicpictures.local"
9/9 15:22:09 Job has WantIOProxy=true
9/9 15:22:09 Initialized IO Proxy.
9/9 15:22:09 File transfer completed successfully.
9/9 15:22:10 Starting a VANILLA universe job with ID: 165980.0
9/9 15:22:10 IWD: x:/work/condor\dir_255476
9/9 15:22:10 Output file: x:/work/condor\dir_255476\_condor_stdout
9/9 15:22:10 Error file: x:/work/condor\dir_255476\_condor_stderr
9/9 15:22:11 Renice expr "0" evaluated to 0
9/9 15:22:11 About to exec c:\tcl\bin\tclsh.exe //Sv_project1/projects/_extensions/Condor/render_mentalray_greedy.tcl 3.4 x:/temp/56/movie_56/shots_3d/shots/ke_020/KE_020_tomeg_block_00/1157804549/mi/tomeg_block_00.0082.mi R:/56/movie_56/shots_3d/shots/ke_020/frames/KE_020_tomeg_block_00/tomeg_block_00.0082.rgb render0027.digicpictures.local
9/9 15:22:11 Create_Process succeeded, pid=513524
9/9 15:26:31 IOProxy: accepting connection from 192.168.0.105
9/9 15:26:31 condor_read(): recv() returned -1, errno = 10054, assuming failure.
9/9 15:26:31 IOProxyHandler: closing connection to 192.168.0.105
9/9 15:26:51 IOProxy: accepting connection from 192.168.0.105
9/9 15:26:51 condor_read(): recv() returned -1, errno = 10054, assuming failure.
9/9 15:55:31 Process exited, pid=513524, status=0
9/9 15:55:35 Got SIGQUIT.  Performing fast shutdown.
9/9 15:55:35 ShutdownFast all jobs.
9/9 15:55:35 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0
9/9 15:55:48 ******************************************************


I also found some information about this error in the archives of this list
(https://lists.cs.wisc.edu/archive/condor-users/2005-February/msg00260.shtml)
but could not find the solution nor the source of the problem.

Its a bit confusing that the eviction does happen instantly but the process looks like as it was completed, although the periodically called chirp process could not connect to the scheduler.

WinXP, Condor 6.8.0 but 6.7.x had the same behaviour.

Cheers,
Szabolcs