Grace and misbehaving servers

Wed Mar 25 22:18:18 UTC 2020

> > A problem with the restart logic is the race it opens since you now
> > have two lookups, but overall, that's the kind of convoluted VCL that
> > should work. The devil might be in the details.
>
> Could you describe this race condition that you mean can happen? What could the worst case scenario be? If it is just a guru meditation for this single request, and it happens very rarely, then that is something I can live with. If it is something that can cause Varnish to crash or hang, then it is not something I can live with :)

In general by the time you get to the second lookup the state of the
cache may have changed. An object may go away in between, so a
restart would cause unnecessary processing that would likely lead to
an additional erroring fetch.

Using a combination of saint mode and req.grace to emulate
stale-if-error could in theory lead to something simpler.

At least it would if this change landed one way or the other:

https://github.com/varnishcache/varnish-cache/issues/3259

> > In this case you might want to combine your VCL restart logic with
> > vmod_saintmode.
>
> Yes, I have already heard some things about this vmod. I will definitely look into it. Thanks.

It used to be a no brainer with Varnish 3, being part of VCL...

> > And you might solve this problem with vmod_xkey!
>
> We actually already use this vmod. But like I said, it doesn't solve the problem with new content that effects existing pages.

Oh, now I get it! That's an interesting limitation I don't think I
ever considered. I will give it some thought!

> Several pages might for example include information about the latest objects created in the system. If one of these pages were loaded and cached at time T1, and then at T2 a new object O2 was created, an "xkey purge" with the key "O2" will have no effect since that page was not associated with the "O2" key at time T1, because O2 didn't even exist then.
>
> And since there is no way to know beforehand which these pages are, the only bullet proof way I can see of handling this is to purge all pages* any time any content is updated.
>
> * or at least a large subset of all pages, since the vast majority might include something related to newly created objects

You can always use vmod_xkey to broadly tag responses. An example
I like to take to illustrate this is tagging a response as "article". If
you change the template for articles, you know you can [soft] purge
them all at once.

That doesn't solve the invalidation using keys unknown (yet) to the
cache, but my take would be that if my application can know that, it
should be able to invalidate individual resources affected by their
new key (I'm aware it's not always that easy).

Dridi