[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Why does machine reject job for unknown reasons



Alex,

On Tue, May 15, 2007 at 04:22:52PM +0200, Johan Bengtsson wrote:
> On tis, 2007-05-15 at 14:53 +0100, Alexander Dietz wrote:
> > Hi,
> > 
> > sorry to bother you again with my question, but this problem still
> > persists. I have recieved so far no idea how to find out why
> > condor-jobs are rejected ...
> 
> Hi Alex,
> Have you checked that both forward and backward name resolving works for
> the machines in your cluster? I think that every time this problem has
> occured in my pool, name resolution has been the cause.
> 
> 	/ Johan

In any case, look into the logs of the local startd (on one of the 150
machines that would match the disk requirement) - you may have to redefine
the log level to see something useful though ...

I'm afraid the behaviour you're seeing is related to resolution issues,
not necessarily names, but domains as well. Did you already post the
details of your submit/DAG file? I don't know about the dependencies,
but since *nothing* is run at all, the first stage would be enough...

Cheers
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html