[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor affecting TCP performance in vanilla universe



Hello,

I have an application that reads some data from one hosts, does CPU
intensive processing, and sends small amounts (in relative terms) of
data to another host. All communication is done over TCP. It is a
stand-alone tool with no knowledge of Condor, and as such I am running
it in a vanilla universe. The tool is generally very heavily used and
has not shown any strange behavior in the past.

The first thing I noticed was that when running in Condor, the process
would consume almost no CPU usage (but running the exact same command
at the shell prompt caused normal behavior). strace:ing it, it kept
blocking on poll():s for writing on the order of an entire
second. Between these one second blocks, it would get off around 10-30
writes (of about 150 bytes each). The size of the individual writes
are completely expected and normal; the periodick blocking on poll()
is not.

This is in spite of the fact that the receiving end is most definitely
not saturated in any way.

I did some tcpdumping, and the most glaring difference between the
traffic resulting from the Condor initiated process and the shell
initiated process, is that in the former case (1) the PSH flag is
never set on outgoing data packets, and (2) the receiving end is
sending ACK:s with a zero window size. Just as if the receiving end's
buffer was full.

This in and of itself is weird to me; even lacking PSH it seems pretty
strange that I could not even stream data (nevermind latency) at
network speed. It almost looks like the receiving end kernel (Linux
2.6.x) is flushing input buffers once per second, rather than
on-demand, when the PSH flag is not set (pure speculation, but the
observations seem to be consistent with that behavior).

But primarily I am not clear on why running the process in Condor - in
the vanilla universe - would have this effect. In fact, I have to
admit that I have not even been able to find out how to set a socket
option that would affect the use of the PSH flag to begin with, and
the application in question never even invokes setsockopt(). That is,
unless there is something in libc being affected by the environment.

Does anyone have any ideas as to what I may be running into? What
magic does Condor do for normalizing/modifying the execution
environment of vanilla processes, that may affect this?

All hosts involved run Linux 2.6.x, and the version of Condor is
6.9.1.

Thank you,

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@xxxxxxxxxxxx>'
Key retrieval: Send an E-Mail to getpgpkey@xxxxxxxxx
E-Mail: peter.schuller@xxxxxxxxxxxx Web: http://www.scode.org