Grace and misbehaving servers

Mon Mar 16 08:58:49 UTC 2020

Hi,

On Sun, Mar 15, 2020 at 9:56 PM J X <batanun at hotmail.com> wrote:
>
> Hi,
>
> I'm currently setting up Varnish for a project, and the grace feature together with health checks/probes seems to be a great savior when working with servers that might misbehave. But I'm not really sure I understand how to actually achive that, since the example doesn't really make sense:
>
> https://varnish-cache.org/docs/trunk/users-guide/vcl-grace.html
>
> See the section "Misbehaving servers". There the example does "set beresp.grace = 24h" in vcl_backend_response, and "set req.grace = 10s" in vcl_recv, if the backend is healthy. But since vcl_recv is run before vcl_backend_response, doesn't that mean that the 10s grace value of vcl_recv is overwritten by the 24h value in vcl_backend_response?

Not really, it's actually the other way around. The beresp.grace
variable defines how long you may serve an object past its TTL once it
enters the cache.

Subsequent requests can then limit grace mode, so think of req.grace
as a req.max_grace variable (which maybe hints that it should have
been called that in the first place).

> Also... There is always a risk of some URL's suddenly giving 500-error (or a timeout) all while the probe still returns 200. Is it possible to have Varnish behave more or less as if the backend is sick, but just for those URL? Basically I would like this logic:
>
> If a healthy content exists in the cache:
> 1. Return the cached (and potentially stale) content to the client
> 2. Increase the ttl and/or grace, to keep the healthy content longer
> 3. Only do a bg-fetch if a specified time has past since the last attempt (lets say 5s), to avoid hammering the backend
>
> If a non-health (ie 500-error) exists in the cache:
> 1. Return the cached 500-content to the client
> 2. Only do a bg-fetch if a specified time has past since the last attempt (lets say 5s), to avoid hammering the backend

What you are describing is stale-if-error, something we don't support
but could be approximated with somewhat convoluted VCL. It used to be
easier when Varnish had saint mode built-in because it generally
resulted in less convoluted VCL.

It's not something I would recommend attempting today.

> If no content doesn't exists in the cache:
> 1. Perform a synchronous fetch
> 2. If the result is a 500-error, cache it with lets say ttl = 5s
> 3. Otherwise, cache it with a longer ttl
> 4. Return the result to the client
>
> Is this possible with the community edition of Varnish?

You can do that with plain VCL, but even better, teach your backend to
inform Varnish how to handle either cases with the Cache-Control
response header.

Dridi