[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] problems running jobs



Hello,

I have been having problems with condor not accepting jobs, or taking
several minutes before it will run a job.  I am just running the "hello
Condor" example binary.  

condor_q -analyze says this (my requirements say just run on one machine, 
which is why only 4 processors match):

012.007:  Run analysis summary.  Of 124 machines,
    120 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match, but are serving users with a better priority in the pool
      4 match, but prefer another specific job despite its worse user-priority
      0 match, but will not currently preempt their existing job
      0 are available to run your job

16 jobs; 16 idle, 0 running, 0 held

When I submit the jobs I get lines like this in the SchedLog:

9/24 10:18:17 QMGR Connection closed
9/24 10:18:18 DaemonCore: Command received via TCP from host <128.101.222.203:56732>
9/24 10:18:18 DaemonCore: received command 1111 (QMGMT_CMD), calling handler (handle_q)
9/24 10:18:18 condor_read(): Socket closed when trying to read buffer

This is the only errors I can really find for the jobs.

Usually after 10+ minutes the job finally runs, though.