[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Network Bandwidth Problem



Hi,

we are running a small ~50 CPU Condor cluster (Linux) using version 6.7.6. 
The cluster runs just great, but we have now got a user who uses the 
standard universe and his jobs need about 800MB of RAM. Therefore, the 
checkpoints are pretty big and the network connection is getting saturated 
when his jobs try to write a checkpoint to the submit machine. Then, the 
compute machine and the submit machine hang for about two minutes until 
the whole checkpoint is written. We have tried to play with the 
PERIODIC_CHECKPOINT statement in order to force the dumps not taking place 
at the same time, but we could not see any real improvement.

I there any way to limited the bandwidth used by Condor to a fixed rate, 
for example 10MB/s? Or would our problem be solved if I update the cluster 
to a more recent version of Condor?

Thanks for your help,

Jens