[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] dual-processor machine with 2 jobs assigned - but one processor is down?



On 7/7/05, Kelly Sorensen <kjs@xxxxxxxxxxx> wrote:
> 
> I am still basically a newbie to Condor, but have been using it about 6
> months in my Optimization research.
> 
> Recently I did a series of timing runs on a set of isolated, dedicated
> machines (ie. I was the only user.) All machines had dual processors and
> identical specs other than memory - but my jobs were not memory
> intensive.
> 
> >From reviewing the log, the first and third jobs were always matched to
> the same machine. These two jobs always took twice as much time as the
> others to finish. All machines were assigned two jobs, but this particular
> machine was the only one to take double-time to finish.
> 
> My interpretation was that one processor was likely down, but the condor
> matchmaker was unaware and continued to submit 2 jobs to the machine
> anyways. Is this a reasonable interpretation? Is there another
> possibility?

1) the number of reporteds CPUS (and thence VM's) was incorrectly set
to twice what it should have been.

2) The two CPUs are logical rather than pysical (i.e. hyperthreading)

3) The OS scheduler is rubbish and decided to allocate both processes
to the same cpu (windows 2000 actually will do this sometimes if the
process name is the same)

4) Something else was happening to said machine at the same time...

If one cpu is 'down' how does your OS handle this? (I'm surprised you
don't just crash...or at least terminate gracefully)

Matt