req.hash_control_* variables

Tue Aug 10 14:20:23 CEST 2010

Hey Martin (and -dev)

The idea we want to implement is for varnish to be able to do
controlled cache misses to update the content, and allow Varnish to
ignore busy-objects in a controlled (through VCL) manner.

The features are only marginally different.

I already implemented return(refresh), but have backed it out again,
because we've decided to go for a more generalized approach and use a
variable instead of a return type. I attached the patch as a
reference. If nothing else, it should give you a vague idea.

The use case for ignoring the busy object is to avoid a race when you
have two Varnish servers asking each other for data when the client
isn't the other Varnish server. IE:

Time     Action
1.        client->v1   GET /foo
1.        client->v2   GET /foo
2.        v1->v2       GET /foo
2.        v2->v1       GET /foo
3         v2: Already fetching /foo (from v1). Wait for it.
3.        v1: Already fetching /foo (from v2). Wait for it.

So far, the naming suggestion is a req.hash_control_always_miss =
{0,1} and req.hash_control_ignore_busy = {0,1}. We did discuss briefly
if we should have a general bitmap or not, but I think we ended up
going for a uint8_t for each of them instead for now.

For VCL, a snippet from vcl_fetch might look like:

if (client.ip ~ varnish1) { set req.hash_control_ignore_busy = 1; }

or:

if (client.ip ~ purgeservice && req.http.X-purge == "true") { set
req.hash_control_always_miss = 1; }

I'll go through VCL in more detail with you and Yves tomorrow, so
you'll have at least a few minutes between VCL 101 and coding it ;)

So take a shot if it seems attainable and interesting.

- Kristian
PS: I CC'ed varnish-dev, as there is no reason not to. I'm going to
default to cc'ing -dev for everything related to development.