[Varnish] #1252: varnish retries fetch from unresponsive backend indefinitely, leading to FD exhaustion

Varnish varnish-bugs at varnish-cache.org
Mon Jan 21 22:32:09 CET 2013


#1252: varnish retries fetch from unresponsive backend indefinitely, leading to FD
exhaustion
----------------------+--------------------
 Reporter:  askalski  |       Type:  defect
   Status:  new       |   Priority:  normal
Milestone:            |  Component:  build
  Version:  trunk     |   Severity:  normal
 Keywords:            |
----------------------+--------------------
 When a backend request fails (e.g. first_byte_timeout), varnish returns a
 503 error only to one client waiting on that object.  It then retries the
 backend request, and keeps doing so as long as there are sessions on the
 waiting list.  These retries are made even for clients that have already
 timed out and closed their HTTP connection to varnish.  This queueing of
 clients and serialization of backend requests leads to file descriptor
 exhaustion.

 This was originally observed on 2.1.5 with the DNS director, but has been
 reproduced on Trunk with a single backend (no director), and also with the
 random director configured with .retries=1

 === Suggested behavior ===

 After some number of failed retries, varnish should flush out the object's
 entire waitinglist with an immediate 503 error.

 === Steps to reproduce ===

 Start with a fresh build of trunk.  Configure the backend with a short
 .first_byte_timeout such as 5 seconds.

 default.vcl
 {{{
 backend default {
         .host = "127.0.0.1";
         .port = "80";
         .first_byte_timeout = 5s;
 }
 }}}

 On the backend, write a script that delays longer than first_byte_timeout.
 For example, this PHP script sleeps for 10 seconds before returning:

 slow.php
 {{{
 <?php sleep(10); ?>
 }}}


 Start varnish:
 {{{
 $ /tmp/x/sbin/varnishd -d -f /tmp/x/etc/varnish/default.vcl -a
 localhost:6081
 Message from VCC-compiler:
 Not running as root, no priv-sep
 Message from C-compiler:
 Not running as root, no priv-sep
 Message from dlopen:
 Not running as root, no priv-sep
 Platform: Linux,3.2.0-36-generic,x86_64,-sfile,-smalloc,-hcritbit
 200 273
 -----------------------------
 Varnish Cache CLI 1.0
 -----------------------------
 Linux,3.2.0-36-generic,x86_64,-sfile,-smalloc,-hcritbit
 varnish-trunk revision d175906

 Type 'help' for command list.
 Type 'quit' to close CLI session.
 Type 'start' to launch worker process.

 start
 child (9286) Started
 200 0

 Child (9286) said Not running as root, no priv-sep
 Child (9286) said Child starts
 Child (9286) said SMF.s0 mmap'ed 104857600 bytes of 104857600
 }}}

 In one window, use "lsof" to monitor varnish's open file descriptors:
 {{{
 $ watch 'lsof -P -n -p 9286'

 Every 2.0s: lsof -P -n -p 9286 | grep TCP
 Mon Jan 21 16:22:33 2013

 varnishd 9286 askalski    5u  IPv4             680493       0t0     TCP
 127.0.0.1:6081 (LISTEN)
 }}}

 In another window, run curl in a loop to request the delay script, using a
 timeout shorter than first_byte_timeout (for example, 1 second.)
 {{{
 $ while sleep 1; do curl -m1 http://localhost:6081/slow.php; done
 curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
 received
 curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
 received
 curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
 received
 curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
 received
 curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
 received
 curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
 received
 curl: (28) Operation timed out after 1001 milliseconds with 0 bytes
 received
 ...
 }}}

 Watch as varnish's file descriptor list grows:
 {{{
 Every 2.0s: lsof -P -n -p 9286 | grep TCP
 Mon Jan 21 16:25:22 2013

 varnishd 9286 askalski    5u  IPv4             680493       0t0     TCP
 127.0.0.1:6081 (LISTEN)
 varnishd 9286 askalski   13u  IPv4             682462       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57800 (CLOSE_WAIT)
 varnishd 9286 askalski   14u  IPv4             680904       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57793 (CLOSE_WAIT)
 varnishd 9286 askalski   15u  IPv4             683250       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57774 (CLOSE_WAIT)
 varnishd 9286 askalski   16u  IPv4             684243       0t0     TCP
 127.0.0.1:40133->127.0.0.1:80 (ESTABLISHED)
 varnishd 9286 askalski   17u  IPv4             682412       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57788 (CLOSE_WAIT)
 varnishd 9286 askalski   18u  IPv4             685075       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57809 (CLOSE_WAIT)
 varnishd 9286 askalski   19u  IPv4             683301       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57780 (CLOSE_WAIT)
 varnishd 9286 askalski   20u  IPv4             680907       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57795 (CLOSE_WAIT)
 varnishd 9286 askalski   21u  IPv4             682466       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57802 (CLOSE_WAIT)
 varnishd 9286 askalski   22u  IPv4             680883       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57787 (CLOSE_WAIT)
 varnishd 9286 askalski   23u  IPv4             680888       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57790 (CLOSE_WAIT)
 varnishd 9286 askalski   24u  IPv4             685068       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57807 (CLOSE_WAIT)
 varnishd 9286 askalski   26u  IPv4             682457       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57797 (CLOSE_WAIT)
 varnishd 9286 askalski   27u  IPv4             680924       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57798 (CLOSE_WAIT)
 varnishd 9286 askalski   28u  IPv4             680959       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57801 (CLOSE_WAIT)
 varnishd 9286 askalski   29u  IPv4             685059       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57804 (CLOSE_WAIT)
 varnishd 9286 askalski   30u  IPv4             682499       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57805 (CLOSE_WAIT)
 varnishd 9286 askalski   31u  IPv4             682536       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57808 (CLOSE_WAIT)
 varnishd 9286 askalski   32u  IPv4             682555       0t0     TCP
 127.0.0.1:6081->127.0.0.1:57811 (CLOSE_WAIT)
 }}}

-- 
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1252>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator




More information about the varnish-bugs mailing list