Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

Fri Dec 17 16:57:54 UTC 2021

-- 
Guillaume Quintard

On Fri, Dec 17, 2021 at 5:18 AM Batanun B <batanun at hotmail.com> wrote:

> Is there even an official word for this final "cache key"? "Hash" clearly
> isn't specific enough. I'm talking about a word that refers to the unique
> key that always corresponds to only a _single_ version of a cached object.
>

I don't think there is  at the moment

Sorry, I'm confused now... Don't touch _which_ guy? Our VCL doesn't contain
> anything regarding "Accept-Encoding". All I said was that the Vary header
> in the response from the backend is "Accept-Encoding". And the way I see
> it, this shouldn't be the cause of the strange problems we are seeing,
> since even when factoring in this, there should exist a matching cached
> object for me, and it should be served regardless of TTL or backend health
> as long as the grace hasn't expired (which it hasn't). Or is my reasoning
> flawed here, based on the VCL snippet in my original post? Can you think of
> a scenario where our VCL would return the synthetic 500 page even when
> there exists a cached objekt matching the hash and vary logic?
>
> As you mentioned that your VCL was simplified, I didn't want to assume
anything. So yes, I meant: do not worry about "accept-encoding", either as
a header, or as an entry in the vary header, Varnish will handle that
properly.

So, the vary hypothesis doesn't pan out. Could it be that your cache size
is too small instead and that the churn is pushing the object out?

Yeah, I think 80MB is a bit to small for us. Ideally we should be able to
> sit down on a Monday and troubleshoot problems that occured Friday evening,
> but that might require a way too big VSL space. But a few hundred MB should
> be fine.
>
> I would just log on disk, and rotate every few days. You mentioned that
the traffic is fairly low, so the disk usage shouldn't be bad, especially
if the backend is on another server. Varnish won't trip itself by choking
the disk, the worst case scenario is that varnishlog will not write the
file enough and will drop a few transactions.

>
> > The problem is that you have to be recording when the initial request
> goes through. But, if you have then, cache hits will show the VXID of that
> first request in their "x-varnish" header, and you can find it this way
> ("varnishlog -r log.bin -q 'vxid == THE_VXID'")
>
> Well, would it really be a cache hit? The main transaction I'm looking for
> is the first transaction for a specific path (in this case, "/") where
> Varnish served the synthetic 500 page. And then I would also like to see
> the closest preceding transaction for that same page, where the hash (and
> the Vary logic) matches the main transaction mentioned above.
>

In that case, you must log all the requests that could match, and once you
have found your offender, walk your way up to find the previous request. I
don't think there's another way here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20211217/b83427ab/attachment.html>