Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

Guillaume Quintard guillaume.quintard at gmail.com
Thu Dec 16 22:42:34 UTC 2021


> Ah, I remember hearing about that earlier, and made a mental note to read
up on that. But I forgot all about it. Now I did just that, and boy was
that a cold shower for me! I definitely need to unset that header. But why,
for the love of all that is holy, does Varnish not include the vary-data in
the hash? Why isn't the hash the _single_ key used when storing and looking
up data in the cache? Why does Varnish hide this stuff from us?

Easy one: you build the key from the request, while the vary header is a
response property. Therefore, Varnish can't know to put the varied headers
into the hash, because it doesn't have that information yet.
Basically, the vary header is backend information that allows Varnish to
properly differentiate objects with the same hash, and the "hash collision"
happens because you, the VCL writer, were not specific enough in vcl_hash
(not necessarily your fault, though).
(technically, and to avoid the wrath of Varnish veterans: it's not a hash
collision because the resources technically represent the same resource and
the varied headers are just secondary keys).

> However, when checking the Vary header from the backend, it is set to
"Accept-Encoding". And since I haven't changed anything in my browser, it
should send the same "Accept-Encoding" request header whenever I surf the
website. And since I have visited the startpage multiple times the last 10
days, it should have a cached version of it matching my "Accept-Encoding".

Don't touch that guy! Varnish will ignore "accept-encoding" in "vary"
because it handles compression internally, and always forces
"accept-encoding: gzip" before entering vcl_backend_fetch. If your VCL
mentions accept-encoding, it's almost always wrong.

> Well, that gives me nothing that is relevant here, sadly. The last time
this happened was a few days ago, and the buffer doesn't seem to be big
enough to keep data that far back.
> But maybe you could describe what you would look for? I would love to
learn how to troubleshoot this.

The default VSL space is 80MB, which is "only" worth a few (tens of)
thousands requests, so yeah, it can be a short backlog. You can instead
start logging into a file:

> varnishlog -g request -q 'RespStatus eq 500' -w log.bin

once the file grows, you can start looking at it using "varnishlog -r
log.bin"

> Thanks, although most of that stuff I already knew. And it doesn't really
give any more advanced examples. Like the problem I mentioned earlier. I
really would like to know if it is possible to find the first request where
it served the 500 page for the "/" url, as well as the request just before
that, for the same url. Do you know how to construct a query that gives me
that?

The problem is that you have to be recording when the initial request goes
through. But, if you have then, cache hits will show the VXID of that first
request in their "x-varnish" header, and you can find it this way
("varnishlog -r log.bin -q 'vxid == THE_VXID'")

Hope that helps

-- 
Guillaume Quintard


On Thu, Dec 16, 2021 at 1:33 PM Batanun B <batanun at hotmail.com> wrote:

> > Could be a vary issue
>
> Ah, I remember hearing about that earlier, and made a mental note to read
> up on that. But I forgot all about it. Now I did just that, and boy was
> that a cold shower for me! I definitely need to unset that header. But why,
> for the love of all that is holy, does Varnish not include the vary-data in
> the hash? Why isn't the hash the _single_ key used when storing and looking
> up data in the cache? Why does Varnish hide this stuff from us?
>
> However, when checking the Vary header from the backend, it is set to
> "Accept-Encoding". And since I haven't changed anything in my browser, it
> should send the same "Accept-Encoding" request header whenever I surf the
> website. And since I have visited the startpage multiple times the last 10
> days, it should have a cached version of it matching my "Accept-Encoding".
>
> > can you post the output of `varnishlog -d -g request -q 'RespStaus eq
> 500'?
>
> Well, that gives me nothing that is relevant here, sadly. The last time
> this happened was a few days ago, and the buffer doesn't seem to be big
> enough to keep data that far back.
>
> But maybe you could describe what you would look for? I would love to
> learn how to troubleshoot this.
>
> > In the meantime, here's a cheat sheet to get started with varnishlog:
> > https://docs.varnish-software.com/tutorials/vsl-query/
>
> Thanks, although most of that stuff I already knew. And it doesn't really
> give any more advanced examples. Like the problem I mentioned earlier. I
> really would like to know if it is possible to find the first request where
> it served the 500 page for the "/" url, as well as the request just before
> that, for the same url. Do you know how to construct a query that gives me
> that?
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20211216/5f6c6b1d/attachment.html>


More information about the varnish-misc mailing list