[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Strange schedd crash (exit status 44)
- Date: Mon, 29 Nov 2004 10:00:52 +0000
- From: matthew hope <matthew.hope@xxxxxxxxx>
- Subject: Re: [Condor-users] Strange schedd crash (exit status 44)
On Sun, 28 Nov 2004 23:55:21 -0600, Derek Wright <wright@xxxxxxxxxxx> wrote:
> 1 general comment:
> whenever a condor daemon exits with status 44, it means it failed to
> write to its log file.
Right ho -a useful piece of info thanks,
> so, i'm guessing the disk is filling up on your submit machine. at
> least, the partition that the SPOOL + LOG directories are on.
12 GB free :) though there may be some other nasty aspect of windows causing it.
Assuming windows daemons, a job log, input, output and error log the
total open files would be?
master = 1 (master log)
schedd = 3 (scheddlog, history, job_queue)
in, out, error ? 3 (guessing not unless you are streaming)
ShadowLog ? one file but many writers, does the schedd handle this for
them or do they gain a lock each time?
job log 1 (or is it only as required?)
Well under any windows limits I should think, will have a think about
> > > <runs condor_q repeatedly>
> yes, that's evil and wrong. we're sorry. the schedd should be
> multi-threaded in some way.
What dev releases are for :¬)
> to some extent, that's what the "MPI" universe already does (and it'll
> get much better in the near future with a generic, more usable
> "parallel" universe). but, point well taken. it's something we've
> been arguing about for years. ;) in theory, there's already pluggable
> negotiation, in that each schedd does it's own decentralized
> scheduling. you're more than welcome to write your own schedd and
> have it talk to an existing condor pool. ;) (yeah, right).
Pop the source and over the wire specs out and I'd consider it.
Really. I'm sure I could convince my bosses to allow folding back into
the main branch (if you could handle my code :¬)
> > > that said I'll go for stability over features every time at
> > > the moment!
> right. then use the stable release. ;)
Fair point, have rolled back the submitters machines already