Bogdan Varlamov (a.k.a. Phantom Stranger)

Detecting a physical network disconnection with .NET Sockets


by Bogdan Varlamov (a.k.a. Phantom Stranger)

I recently ran into an issue with an application that was not detecting a physical network problem and was consequently losing data while attempting to write it to the socket.

I know what you are thinking. Sounds like poor programming; surely the .NET Socket class has a clean and Object Oriented way of handling network failures.

That is what I thought at first. The .NET Framework is filled with tons of goodies--namespaces with classes that make coding life easier. I am usually fairly satisfied with what is available in .NET (especially in .NET 3.5), but when it comes to the System.Net.Sockets namespace, there is just more to be desired.

At first glance, it seems like the Socket class provides exactly what you need to get the job done. There is even a "Connected" property that is exposed, which seems like a natural thing to check before trying to send data over a connection.

The way that this application was designed, however, makes it a moot point as to what the Connected property returns (If you check the MSDN, you will see that the Connected property actually returns whether or not the last IO operation on the socket was successful--not whether or not there is a connection currently).

This application was coded to handle failures to write to the Socket. The logic flow for this process looked something like this:

  1. Add all outgoing messages into an outgoing Queue for storage in case of problems
  2. Peek at the Queue to get the next message (not Dequeueing/Requeueing since the order of the messages matters in this application)
  3. Attempt to write the bytes of the message to the socket
  4. If all bytes were written successfully, Dequeue the message as we no longer need to store it for sending
  5. If the entire message was not transmitted, reset the connection and retry sending upon reconnection

I looked at the code, and could not tell where the failure was occurring. This application was logging successful socket writes and the application on the other side never got the bytes! How was this nightmare possible?

Well... it was all thanks to the magic of TCP/IP. The "magic of TCP/IP" is a topic for another article (or book even) so I won't go into great detail here. Basically, once a TCP/IP Socket connection is "established" it is considered valid even if the physical connection is broken. In fact, if you unplug the OTHER computer, your computer won't know the connection is gone until you try sending data and the TCP/IP stack fails to deliver it after a certain timeout. TCP/IP was meant to be a robust protocol, being able to recover from intermittent physical connectivity failures. And it does a great job at what it was designed to do...

However, in my scenario I needed to know instantly (or almost instantly) whether or not the message was delivered to the other machine--not 30 seconds, 2 minutes, or 2 hours after we tried sending another message.

How could I know BEFORE a TCP/IP retry timeout whether or not the connection was still there? The answer: TCP keep-alives!

These are short messages that are basically just empty packets that need to be ACKed. When this message is sent from one computer to another, the TCP/IP stack replies to it with an ACK. The best thing about keep-alives is that any system that implements TCP/IP automatically supports keep-alives. Once they are turned on if the connection between the two TCP/IP stacks is physically aborted and a keep-alive is not ACKed within a certain time the sending stack knows that the connection is broken.

I searched how to set TCP Keep Alives from the managed .NET sockets and finally came across Larry Cleeton's blog entry which shares this method:

 
static void SetKeepAlive(Socket s, bool on, uint time, uint interval) {
/* the native structure
struct tcp_keepalive {
ULONG onoff;
ULONG keepalivetime;
 
ULONG keepaliveinterval;
};
*/ 
 
// marshal the equivalent of the native structure into a byte array
uint dummy = 0;
byte[] inOptionValues = new byte[Marshal.SizeOf(dummy) * 3];
BitConverter.GetBytes((uint)(on ? 1 : 0)).CopyTo(inOptionValues, 0);
BitConverter.GetBytes((uint)time).CopyTo(inOptionValues, Marshal.SizeOf(dummy));
BitConverter.GetBytes((uint)interval).CopyTo(inOptionValues, Marshal.SizeOf(dummy) * 2);
// of course there are other ways to marshal up this byte array, this is just one way
 
// call WSAIoctl via IOControl
int ignore = s.IOControl(IOControlCode.KeepAliveValues, inOptionValues, null); 
 
}
 

The "time" variable represents the time to wait between each TCP Keep Alive that is sent out. The "interval" variable represents the amount of time that the TCP/IP stack should wait to receive a response from the other machine.

By turning on the keep alives you will be able to know when the physical network connectivity is broken.


Rss Comment

No Comment

No comments yet.

Post a Comment





This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

You must read and type the 5 chars within 0..9 and A..F, and submit the form.

  

Oh no, I cannot read this. Please, generate a