[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs not Running Problem



Natarajan, Senthil wrote:
> Hi,
> I am trying to submit job to AIX52 (Power3 II). It is a compute node so just condor_master, condor_startd are running, which has condor version 7.0.5.
> Submit node (Central Manager) is Linux RHEL 4.0
> 
> Job is not properly running, after running few second it become idle.
> 
> Here is the ShadowLog
> ******************
> 6/2 09:15:38 Initializing a VANILLA shadow for job 251780.0
> 6/2 09:16:02 (251780.0) (31549): Request to run on <111.111.11.111:9608> was REFUSED
> 6/2 09:16:02 (251780.0) (31549): Job 251780.0 is being evicted
> 6/2 09:16:13 (251780.0) (31549): logEvictEvent with unknown reason (108), aborting
> 6/2 09:16:13 (251780.0) (31549): ZKM: setting default map to (null)
> 6/2 09:16:13 (251780.0) (31549): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 108
> 6/2 09:20:37 Initializing a VANILLA shadow for job 251780.0
> 6/2 09:21:11 (251780.0) (31612): condor_read(): timeout reading 5 bytes from <111.111.11.111:9608>.
> 6/2 09:21:11 (251780.0) (31612): IO: Failed to read packet header
> 6/2 09:21:11 (251780.0) (31612): DCStartd::activateClaim: Failed to receive reply from <111.111.11.111:9608>
> 6/2 09:21:11 (251780.0) (31612): Job 251780.0 is being evicted
> 6/2 09:21:21 (251780.0) (31612): logEvictEvent with unknown reason (108), aborting
> 6/2 09:21:21 (251780.0) (31612): ZKM: setting default map to (null)
> 6/2 09:21:21 (251780.0) (31612): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 108
> 
> 
> In the compute node condor_config file, I have
> ALL_DEBUG               = D_ALL
> STARTD_DEBUG       = D_FULLDEBUG D_COMMAND D_FULLDEBUG
> 
> Still I didn't see any error message in the StartLog.
> 
> Could you please let me know what might be the problem.
> 
> Thanks,
> Senthil

http://www.cs.wisc.edu/condor/manual/v7.3/Appendix_B_Magic.html

condor_shadow Exit Code 108 = can not connect to the condor_startd or
request refused

Errors or not, you should try to correlate times between the Shadow and
Start logs.

Best,


matt