[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Checkpointing Errors



Hello,

On Tue, May 08, 2007 at 11:45:12AM -0500, Todd Tannenbaum wrote:
> Simon David Hammond wrote:
> >>Is Condor configured to send the checkpoint back to the condor_shadow 
> >>process, or have you configured a checkpoint server?
> >
> >We have configured a checkpoint server, it runs on what we identify as a 
> >server. So it has a HIGHPORT of 9500 and a LOWPORT of 9000.
> 
> Ugh.  Our apologies, and also our thanks....  I think you have found a bug.
> 
> Looking at the source code, it looks like HIGHPORT/LOWPORT is not 
> honored by the checkpoint server.  We will try to remedy this in the 
> next release.

Ok, here's the deal. :)

The checkpoint server will still require the well-known ports of 5651,
5652, 5653, and 5654 to be opened through the firewalls. However, the
checkpoint server child processes whose ephemeral port is given to the
user jobs and which perform the actual transfers are now bounded to
LOWPORT/HIGHPORT if available.

Note that this limits the concurrent number of store/restore requests to be
(HIGHPORT - LOWPORT) + 1.

This bug fix will appear in 6.8.6 and 6.9.3.

Thank you.

-pete