[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] multiple cpu's

On Fri, 02 Jan 2004 02:41:39 -0800  Ratul Mahajan wrote:

> my job is a perl script that spawns another process using the
> "system" command. its quite likely that the spawned process will run
> on the second CPU. this " fools" condor into thinking that the new
> process is NonCondorLoad, with the res ult that the second CPU is
> never allocated (the first one is idle, since the o riginal script
> is waiting for the spawned process to finish).

no, it's not the fork() that's fooling condor.  instead, the problem
lies with the way we compute the so-called "CondorLoadAvg".  it's one
of the oldest bugs in Condor. :(  it's a long story, but the end result
is that the CondorLoadAvg (and therefore the NonCondorLoad, since
that's just the real system load minus the CondorLoadAvg) are not
accurate, especially during the first minute a job starts running and
the minute after it completes. :(

for a more detailed explaination, check out the FAQ, "Why do my
vanilla jobs keep cycling between suspended and unsuspended?" (that's
another common symptom of the underlying problem)


> is the above behavior expected with condor, or i am missing
> something?

it's expected, only in the sense that this is a known bug...  see

> whats the best way to deal with it (run multiple virtual machines
> per CPU?)?

that FAQ entry has some good suggestions for ways to cope with the
problem.  the more drastic approach is to not use the load average at
all in your policy expressions.  again, look at the manual for

good luck,

p.s. don't go the route Mark Silberstein suggested of trying to "tie"
your jobs to a specific virtual machine, either via Condor or perl.
that's not going to solve your problem at all, so don't bother...
Condor Support Information:
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>