[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Questions about GCB
- Date: Wed, 04 Jul 2007 09:41:32 -0600
- From: Dave Schulz <dschulz@xxxxxxxxxxx>
- Subject: [Condor-users] Questions about GCB
I have a few questions about GCB.
Here at UofC we are trying to implement a homogenous condor cluster of
Linux compute machines on top of Windows hosts using virtualization.
The virtual machines communicate to the real world via a NAT network of
one machine between the host and the guest OS.
The problem that we are having is with the GCB machine. It seems to
just drop all of the connections between the execute only machines and
the collector. The funny thing is that there are still TCP and UDP
connections open to both the nodes and the collector when viewed from
the GCB machine using netstat. The number of connections per execute
only machine is in the 10-30 range and there is only ~20 machines at the
moment (we're working on >2000 over the next year or so). The only way
to get the nodes to reconnect is to kill all gcb processes and restart.
Then the nodes will gradually find the collector again.
This happens under moderately high job turnaround but the number of
connections being created on the GCB machine is considerably lower than
the Linux kernel maximums in /proc.
Finally my questions: Are we using GCB incorrectly? The execute
machines make no connections to the Collector that I can see in
netstat. Is GCB designed for only a few NAT networks of more than 1
Research Computing Services
University of Calgary