VCL control of Vary processing

Wed May 29 18:02:41 CEST 2013

I'd like to make a case for some additional control of Vary processing by VCL.

As everyone knows, the HTTP standard Vary header is pretty limited in terms of flexibility. It only supports specifying header names, and when used in a naive way, i.e. for headers which clients fill with greatly varying content or format, it can be disastrous to your cache efficiency. Good examples of this are the Cookie, Accept-Language, User-Agent headers.

Fortunately Varnish provides a powerful way to influence this with VCL, and headers can be filtered, normalized or cleaned up before they hit the cache lookup stage. This works well for headers with well known, predefined keys or values: specific strings can be tested for in VCL, and be removed from the header if not relevant for the cache lookup. With a bit more string (list) processing, this could even be easier and more powerful with list filtering, sorting capability for canonical ordering etc., but that's the sort of thing that could be implemented in a VMOD.

What isn't currently very easy and straightforward to do in Varnish, is using information sent by the backend for Vary processing. Sometimes it's not easy or even nearly impossible to put the logic for preprocessing/sanitizing the varied headers in VCL; for example because it varies (no pun intended) a lot across all served content. In that case it could be nice to have the backend provide support for the Vary preprocessing logic in VCL, for example by sending a list of acceptable values for a header that is varied on for that particular URL, in a response header.

Take for example the Accept-Language header. Although fortunately not required for the majority of content we serve, we do have a need to Vary on that header for some wikis in the Wikimedia cluster, for language wikis with multiple variants. Obviously, simply adding the header to the Vary list without anything else would simply destroy the caching. Users across the world send wildly different Accept-Language based on their location and browser settings. Doing some regex filtering and sanitization of the header would help a bit, but it's still far from optimal.

What we've done in the last 5 years with Squid is to have our backend (MediaWiki) send a header specifying what header content should be varied on, and what should be ignored. It's handled by a patch to Squid, and it's described here:

	http://www.squid-cache.org/mail-archive/squid-dev/200802/0085.html

This has worked very well for us and has resulted in a very good cache efficiency, while still providing some flexibility at the cache level that is being directed by the backend. This was convenient, as the backend is in control of the cache variance when and where it's needed. It knows its content and its needs best, and Squid doesn't allow for much logic and flexibility in its configuration anyway.

With our migration to Varnish, we can do a large part of the necessary cache varying processing in VCL, as described above. Accept-Encoding (compression) normalization is already better handled by Varnish internally, most cookie names/values are pretty consistent across our content set (in _our_ application, but YMMV!) and with a few hacks in VCL we can mostly get there. But the Accept-Language issue as described above is a tougher one, as the acceptable values depend on which wiki/language is being served, and which variants are available for that particular URL. We've looked at porting (a variant of) the X-Vary-Options header to Varnish, using it in some way with VCL for the logic behind it, but it seems messy and suboptimal without patching Varnish itself, which isn't a very nice solution either.

It seems that at the root of the problem is the current lack of ability in VCL to influence the Vary processing, while having an object from the cache available for lookup of a response header. Something like a VCL hook, between the result of a cache lookup hit, but before additional Vary processing/lookup is done. Conceptually, assuming the Vary response headers by the backend for Vary direction (like Vary and X-Vary-Options) are consistent across all variants, a random (or first) variant object at the hash key location could be used for "obj" in VCL in the hook. The VCL hook can learn about which headers are varied on (Vary resp. header), possibly get the allowed values for each header (e.g. a list of available languages for this url, "Available-Languages: sr-ec,sr-el"), and can filter the Accept-Language header in preparation of the Vary processing by Varnish. Even nicer if it could influence the Vary string generation itself, similar to how hash_data() works.

Right now a hook between the cache lookup hit and the Vary processing step doesn't exist, which makes this clunky. You can probably sort of do it with restart today:

1) In vcl_recv() or vcl_lookup(), guess which headers might be in the Vary header, and rename those out of the way
2) in vcl_hit(), check the Vary response header for which headers are varying, sanitize those from the saved values, restore them under original name, restart the request
3) Make sure to skip 1) and 2) in the restarted request. vcl_recv, vcl_lookup, and the cache lookup are repeated although this isn't really necessary.

This probably works, but VCL has limited info in step 1), and the necessary request restart makes things more complex and requires the rest of the VCL code to handle that, while it's not really obvious.

I think a VCL hook between the lookup hit and vary processing stage would go a long way toward solving this problem in a clean way. This should allow for a lot of flexibility for optimizing cache variance, while not sacrificing performance. It could easily be used for things like the Cookie header as well, when the "allowed" cookies are not easily hardcoded in VCL. One possible catch/frequent mistake I see to such a VCL hook: it does require people to be aware that the "obj" reference they get in that VCL hook would not be the final variant object that will eventually deliver the object...

Additionally it could be nice to be able to restart just a lookup in vcl_hit/vcl_miss, to be able to modify the request "to make a cache hit work" without restarting everything, for some other use cases as well. But I can also see problems with introducing such a loop. Right now it looks like VCL has no cycles besides the big "restart" of the request, and this would change that.

As far as I can see from phk's flow diagram of last month, this also doesn't seem possible with the Varnish 4.0 vcl_lookup() plan[1] either. Considering the future of the VCL state machine flow and the VCL hooks is now under discussion, I thought I'd throw this in there. :)

[1] https://www.varnish-cache.org/lists/pipermail/varnish-dev/2013-April/007529.html

-- 
Mark Bergsma <mark at wikimedia.org>
Lead Operations Architect
Wikimedia Foundation