Varnish returning synthetic 500 error even though it has stale content it should serve. But only seems to happen during/after a burst of traffic

Dridi Boukelmoune dridi at varni.sh
Mon Dec 20 14:03:54 UTC 2021


On Fri, Dec 17, 2021 at 4:03 PM Marco Dickert - evolver group
<marco.dickert at evolver.de> wrote:
>
> On 2021-12-17 15:25:31, Batanun B wrote:
> > Thanks. I have thought about that too. But I think we might want to include
> > non-error transactions as well. I mean, with the problems this post is about
> > we want to see when the cached version of the start page was generated and
> > when it was last served from cache successfully. But maybe we could have a
> > permanent logging just for the start page, regardless of http status. That
> > should hopefully reduce the logging intensity enough so that logging to disk
> > isn't effecting the Varnish performance.
>
> Well, it depends on the performance of your storage and the amount of req/sec on
> the front page, but these logs can get very huge very quickly. I'd suggest to
> determine the correct delivery of the front page via an external monitoring
> (e.g. icinga2 or a simple script). As far as I understand, you don't need to
> know the exact request, but more of a rough point in time of when the requests
> start failing. So a monitoring script which curls every minute should be
> sufficient and causes a lot less trouble.
>
> > One thing though... If you log all "status: 500+" transactions to disk, isn't
> > there a risk that your logging might exacerbate a situation where your site is
> > overwhelmed with traffic? Where a large load causes your backends to start
> > failing, and that triggers intense logging of those erroneous transactions
> > which might reduce the performance of Varnish, causing more timeouts etc which
> > cause more logging and so on...
>
> Indeed there is a risk of self-reinforcing effects, but it didn't happen yet. We
> also do not plan to logging 500s forever, but only till our problem is solved,
> which is an error in varnishs memory handling. At the moment, our most
> concerning 500s are caused by varnish itself, stating "Could not get storage",
> when the configured memory limit is reached.

If you get a surge of 5XX responses from either Varnish or the
backend, you can also rate-limit logs to the disk:

https://varnish-cache.org/docs/6.0/reference/varnishlog.html

See the -R option.

Cheers,
Dridi


More information about the varnish-misc mailing list