[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] schedd problems?

On Fri, Feb 25, 2005 at 10:44:10AM -0600, Paul Armor wrote:
> Hi,
> thanks, but I don't think it's a naming problem.  I'll try to put the 
> appropriate logs somewhere viewable, so if you're interested ask me and 
> I'll tell you where they are, but schedd seems to be dying intermittently 
> (every few hours) and condor_master is restarting...
> So in summary, schedd is repeatedly dying on the box, and we've once 
> witnessed where condor_master thought it'd restarted schedd (but schedd 
> wasn't really started).  Also, one user is running a dag, and he's just 
> reported that the dag "went away", but the jobs are still running; I'm 
> working out what his jobs are doing, and seeing if he's filled up the 
> filesystem he was writing output to...


did you try to set the debug level somewhat higher?
Some problems apparently can only be found in the logs if the debug
level is high enough, even if the problem is more or less obvious...


Steffen Grunewald * * * Merlin cluster admin (http://pandora.aei.mpg.de)
Albert-Einstein-Institut (MPI Gravitationsphysik, http://www.aei.mpg.de)
       Science Park Golm, Am Mühlenberg 1, 14476 Potsdam, Germany
e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}