Reliably detecting clients that has hung up after waitinglist

Thu Feb 7 17:21:27 CET 2013

Hi,

We have an open ticket #1252, that I am in the process of creating a patch
to fix. The problem here is with regard to waitinglists and failure mode.
When a slow backend times out, the requesting client will get it's 503 page
and go away, scheduling the waitinglist in the process. One of these parked
sessions will then retry, and the rest goes back on the waitinglist. The
failing backend doesn't recover, so the same happens on every backend
attempt. As there isn't any client communication going on here, Varnish
never notices if the client (very likely after n * first_byte_timeout) has
given up and gone away. For a popular page the number of sessions coming
into the waitinglist (page's hitrate) is going to be much higher than the
number leaving (only one every first_byte_timeout). End result is going out
of file descriptors as observed in ticket #1252, or hitting session_max as
we have observed in a $customer case.

The solution to this was first thought to be simple. When a session comes
off the waitinglist, check to see if the connection has been closed. If it
has, drop the session. Only this turned out to be quite hard to do. The
normal TCP EOF checking can't really be done due to HTTP pipelining. If we
try to read data from the socket, we have to store the data away somewhere
for reuse, and if the next request is a HTTP post with a large body, we'll
run out of httpconn pipeline buffer space. Doing recv(2) with MSG_PEEK
won't do for the same reason, as we'll only check to see if the OS buffer
has anything there, and if there's pipelining in play we'll get a false
positive.

Doing a poll() on the socket was next attempted, with the POLLHUP
tantalizingly seeming the perfect state. Unfortunately POLLHUP will only be
signaled when both ends agree on the connection being closed, and in this
case it's only the client that has closed the connection (FIN received by
the server, but no FIN is sent the other way until the socket is closed on
the server side). The poll()'s thus returned only POLLIN and POLLOUT.

The Linux specific POLLRDHUP could be used to detect the FIN from the
client, giving reliable results that the client has indeed closed it's end
of the duplex connection. But on closer thought and a lot of googling, I
can't find anything about HTTP clients not being allowed to half-close the
TCP connection after sending the request and still expect to read the
reply. On the contrary, exotic clients did turn up that does exactly that.
So this also turned out not to be a solution.

What we really want to know in this case, is not if the client will be
sending us any more data, but if the client is still going to accept our
response when(/if) we are able to deliver it. So it is writing data that we
need to test. But the HTTP protocol isn't going to allow us to write
something just to test if that'll result in TCP RSTs. So next thing I have
tested out is SO_KEEPALIVE option on the socket. This will send periodic
messages to the client, that it will have to ACK, which should eventually
allow the server TCP stack to learn that there are no clients on the other
end. On FreeBSD SO_KEEPALIVE is on by default I believe, but on Linux it
isn't. So with SO_KEEPALIVE on, the client would send a RST after the
socket left FIN_WAIT2 state (60s), which closed it on the server side too,
allowing Varnish to get POLLHUP state and kill off the session!

Reading up on SO_KEEPALIVE, there are some caveats though. By default both
Linux and FreeBSD won't start sending keep-alives until the connection has
been idle for 7200 seconds (2 hours?!). And then only after 9 unsuccessful
probes spaced 75 seconds apart will it kill the connection. So it will by
default take 2 hours 11 minutes for this to trigger. For a webserver these
defaults seems too large and should be lowered. But for the problem at hand
I believe the default values will still benefit, as this problem doesn't
happen instantly but builds over time when dealing with unresponsive
backends. On Linux it is possible to specify the SO_KEEPALIVE idle timer
and connection attempts/interval per connection using custom SO_-options.

So as a final solution to this problem, I suggest always setting
SO_KEEPALIVE on our sockets. Then add parameter settings for the keep alive
idle timer, the count and the interval, with better defaults for a
webserver (or maybe just setting the idle time to sess_timeout / 2?).
(Docfix FreeBSD / other platforms that can't set the keep alive values per
socket). And finally making Varnish poll() the client socket on return from
waitinglist and kill the session on POLLHUP.

Any comments to this plan of action would be most welcome, and also any
clever ideas that I might have missed to detect this situation in the first
place.

Regards,
Martin Blix Grydeland

-- 
<http://varnish-software.com>*Martin Blix Grydeland*
Senior Developer | Varnish Software AS
Cell: +47 21 98 92 60
We Make Websites Fly!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20130207/8544af56/attachment.html>