[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] schedd problems?
- Date: Fri, 25 Feb 2005 10:44:10 -0600 (CST)
- From: Paul Armor <parmor@xxxxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] schedd problems?
thanks, but I don't think it's a naming problem. I'll try to put the
appropriate logs somewhere viewable, so if you're interested ask me and
I'll tell you where they are, but schedd seems to be dying intermittently
(every few hours) and condor_master is restarting...
So in summary, schedd is repeatedly dying on the box, and we've once
witnessed where condor_master thought it'd restarted schedd (but schedd
wasn't really started). Also, one user is running a dag, and he's just
reported that the dag "went away", but the jobs are still running; I'm
working out what his jobs are doing, and seeing if he's filled up the
filesystem he was writing output to...
On 25 Feb 2005, krishnaprasad wrote:
> Hi paul
> Your IP may not be assigned to correct hostname. edit in /etc/hosts
> file and then reconfig condor
> Best Wishes
> On Thu, 2005-02-24 at 22:00, Paul Armor wrote:
> > Hi,
> > I've got a strange problem (aren't they all?), and could use guidance on
> > how to figure out what's wrong. I have a submit machine that can no
> > longer tell what jobs are in it's own queue. I upgraded condor to 6.7.3
> > (from 6.6.7) on Feb 10; yesterday (Feb 23), it was noticed that condor_q
> > would return:
> > -- Failed to fetch ads from: <18.104.22.168:38456> : hydra.phys.uwm.edu
> > SchedLog doesn't seem to show anything interesting...
> > How can I debug what's failing?
> > Thanks!
> > Paul Armor
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> Condor-users mailing list
+ UWM-LSC Group Systems Administrator parmor@xxxxxxxxxxxxxxxxxxxx +
+ Physics 462 +
+ U. of W. - Milwaukee +
+ PO Box 413 414-229-2677 +
+ Milwaukee, WI 53201 fax 414-229-5589 +