Varnish 503ing on ~1/100 POSTs

Ronan Mullally ronan at iol.ie
Thu Mar 10 14:29:23 CET 2011


Hej Tollef,

On Thu, 10 Mar 2011, Tollef Fog Heen wrote:

> |   33 FetchError   c http first read error: -1 0 (Success)
>
> This just means the backend closed the connection on us.
>
> |   33 FetchError   c backend write error: 11 (Resource temporarily unavailable)
>
> This is a timeout, however:
>
> |    33 ReqEnd       c 657185708 1299604110.559967279 1299604113.447372913 0.000037670 2.887368441 0.000037193
>
> That 2.89s backend response time doesn't add up with your timeouts.  Can
> you see if you can get a tcpdump of what's going on?

I'll see what I can do.  Varnish is serving an average of about 20 objects
per second so there'll be a lot of data to gather / sift through.

The following numbers might prove useful - they're counts of the number of
successful GETs, POSTs and 503s since 17:00 yesterday.

             GET               POST
  Hour   200      503       200     503
 ------------------------------------------
 17:00  72885   0 (0.00%)   841   0 (0.00%)
 18:00  69266   0 (0.00%)   858   6 (0.70%)
 19:00  65030   0 (0.00%)   866   3 (0.35%)
 20:00  70289   0 (0.00%)   975   8 (0.82%)
 21:00 105767   0 (0.00%)  1214   5 (0.41%)
 22:00  86236   0 (0.00%)   834   3 (0.36%)
 23:00  67078   0 (0.00%)   893   2 (0.22%)
 00:00  48042   0 (0.00%)   669   4 (0.60%)
 01:00  35966   0 (0.00%)   479   0 (0.00%)
 02:00  29598   0 (0.00%)   395   3 (0.76%)
 03:00  25819   0 (0.00%)   359   0 (0.00%)
 04:00  22835   0 (0.00%)   366   4 (1.09%)
 05:00  24487   0 (0.00%)   315   1 (0.32%)
 06:00  26583   0 (0.00%)   353   4 (1.13%)
 07:00  30433   0 (0.00%)   398   2 (0.50%)
 08:00  37394   0 (0.00%)   363   9 (2.48%)
 09:00  44462   1 (0.00%)   526   4 (0.76%)
 10:00  49891   2 (0.00%)   611   4 (0.65%)
 11:00  54826   1 (0.00%)   599   7 (1.17%)
 12:00  60765   6 (0.01%)   615   1 (0.16%)
 13:00  18941   0 (0.00%)   190   0 (0.00%)

Apart from a handful of 503s to GET requests this morning (which I've not
had a chance to investigate) the problem almost exclusively affects POSTs.
The frequency of the problem does not appear to be related to the load -
the highest incidence does not match the busiest periods.

I'll get back to you when I have a few packet traces.  It will most likely
be next week.  FWIW, I forgot to mention in my previous posts, I'm running
2.1.5 on a Debian Lenny VM.


-Ronan




More information about the varnish-misc mailing list