[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Procd behaving badly in a multi-startd setup



On Thursday, 8 September, 2011 at 11:43 AM, Dan Bradley wrote:
The problem is likely that both startds are creating their own procd, but these two procds are using the same named pipe for communication, so wires are getting crossed.  You could configure PROCD_PIPE differently for the two startds.  Or you could just configure the startds to share a single procd.  One way to achieve that is this:

MASTER.USE_PROCD = TRUE

That causes the master to create a procd, which is then shared by all of its children.  Depending on the answer to your puzzling performance problems, having a single procd may be better than two.  Then again, it could be worse.  It would be interesting to find out!
This worked. Two startds happily running jobs with a procd in place for process monitoring now.

I won't know until Monday if this has any impact on scalability or not. With USE_PROCD=True I was able to get a single startd-equipped, 40-core machine up to ~25 running jobs, but beyond that point things start to fall apart with internal process communication issues. It's unclear if the comm issues are startd <-> starter or procd <-> starter though. Certainly the one procd is running at or near 100% CPU when this many jobs are on the box.

I was hoping multiple startds would mean multiple procds. But it may be okay.
The startd launches the procd on-demand.  This likely means it won't start one until it runs its first job.
Good to know. The condor_procd process gets less mysterious every day. :)

Regards,
- Ian

---
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com
http://twitter.com/cyclecomputing