[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Restarting comms with XP nodes (SP2 UDP bugette)



Bruce,

Thanks for the info.  Looks like KB 824838 is in regards to broadcast
packets so I'm not sure if that has a bearing on Condor.  Also, the bug
doesn't take effect unless the packet is larger than 1500 bytes on an
Ethernet II network.  I did some packet captures using Ethereal.  The
results aren't definitive, but it looks like the UPD transactions are
unicast in nature, I detected no fragmented packets, and none of the
packets exceeded 1100 bytes, well within the maximum.

What I really need to do is run the capture long enough to catch one of
the nodes doing a disappearing act.  It happens at least once every 24
hours for every one of my nodes.  If I get a chance to do the
experiment, I'll be sure to post the results.

-Bryan

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Bruce Beckles
Sent: Wednesday, June 22, 2005 12:47 PM
To: Condor-Users Mail List
Subject: RE: [Condor-users] Restarting comms with XP nodes (SP2 UDP
bugette)

On Wed, 22 Jun 2005, Bryan S. Maher wrote:

> Can you provide any details on this MS patch?  As you point out, the
> problem occurs on non-SP2 machines as well so I'm curious what the
root
> cause of the problem is.

I don't know if this is the patch in question, but there is currently a 
hotfix for Windows XP *and* Windows Server 2003 incorrectly calculating 
the IP checksum on large UDP broadcast packets.  This issue affects 
Windows XP SP1 and Windows XP SP2 as well as Windows Server 2003 (I
don't 
know whether it affects Windows XP with no Service Pack applied).  See 
Knowledge Base Article 824838:

 	http://support.microsoft.com/kb/824838/


Last week Microsoft released an update to a Security Bulletin MS05-019 
(see:

 	http://www.microsoft.com/technet/security/bulletin/MS05-019.mspx

) which fixes a problem in the TCP/IP stack introduced by an earlier 
version of the security update supplied with that Security Bulletin and 
also by Windows Server 2003 Service Pack 1.  This may or may not be 
relevant to the problems people have been experiencing with Condor under

Windows; I don't know.  Whether this version of the security update 
incorporates the hotfix mentioned above is anybody's guess.  For a 
description of the problem caused by the earlier version of the security

update see Knowledge Base Article 898060:

 	http://support.microsoft.com/kb/898060/

...and for a description of issues with the just released version of the

security update, see Microsoft Knowledge Base Article 893066:

 	http://support.microsoft.com/kb/893066/


Hope that is of some use to someone....

 	-- Bruce

--
Bruce Beckles,
e-Science Specialist,
University of Cambridge Computing Service.
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users