Degraded operation

Wed Sep 12 17:33:25 CEST 2007

In message <ujrzlzs6jqh.fsf at false.linpro.no>, =?iso-8859-1?Q?Dag-Erling_Sm=F8rg
rav?= writes:

(I'm in the middle of EuroBSDcon2007, so please bear with me if my
replies are short and delayed the next week.)

>What follows is my design proposal for degraded operation, i.e. the
>ability to serve expired content when a backend is not responding.

I'm not entirely happy with this proposal for a large number of
reasons, but instead of picking it apart, here are my thoughts
on the subject.

The first observation is that the entire concept of degraded mode
is a minefield of decisions and trouble once you have multiple
backends.

There is simply no way we can satisfy the countless and generally
unique configurations with any kind of one-size-fits-all policy
decision.

The second observation is that everything in Varnish should be
controlled/-able from VCL.

Once those two are nailed firmly in place, it follows that we will
not have a "degraded mode" but merely switch between different VCL
programs.

And that's roughly speaking, the end of any discussion on degraded
mode.

An entirely different, but not unrelated subject, is what facilities
Varnish offers, and one of those most relevant to a degraded mode
is to serve content which is technically expired.

The way things are planned in that area is something like:

	An object gets fetched in from backend.

	A timeout and ttl gets assigned.

	when timeout expires vcl_timeout() can decide:
	    - to set a new timeout
	    - to gzip object
	    - to discard object
	    - to prefetch new copy of object
	    (possibly others)

What you might want to do to implement a degraded mode would be
to only discard objects for which you have a newer copy, this
could look something like this in VCL code:

	sub vcl_fetch {
		set obj.ttl = $mypolicy;

	        /* check for gzip necessary early in lifetime */
		set obj.timeout = obj.ttl / 10;
	}

	sub vcl_timeout {
		/* Only discard if we have a newer version */
		if (obj.has_replacement) {
			discard;
		}

		/* Try to prefetch up to four times in the last 2 minutes */
		if (obj.ttl <= 120s) {
			set obj.timeout = 30s;
			prefetch;
		}

		/* Call again two minutes before expiry */
		set obj.timeout = obj.ttl - 2m;

		/* Gzip if we use it a lot */
		if (obj.usage > a_lot) {
			set obj.gzip = true;
		}
	}

And then degraded mode, more or less, comes down to ignoring
the TTL check.

The TTL is currently implemented in C and it may not be suitable
to move it to VCL, but this could be handled with way for
vcl_hash() to ask for it to be ignored, so the degraded
version of the VCL code would have something like:

	sub vcl_hash {
		req.hash.ignore_ttl = true;
	}

Which would then return the youngest matching object, even if
it has expired.

There are a lot of other tricky issues to resolve though, mainly
"what tools do we offer to decide that we should hit degraded mode"
and even worse, "how do we know when to get back into normal mode".

The answer to the first one is probably the backend statistics
and a check somewhere in the VCL program, possibly in the
yet-to-come vcl_error().

The answer to the second one is much more tricky, because how
do we know that the backend is back, when we don't ask it ?

The first option, which I think will be quite complex and not
anywhere near good, would be to use prefetching to detect when the
backend comes up again.  One would have to decided what to pretech
and to add a level of detection to the results.  All undesirable
overhead in my eyes.

The other, much simpler option is to keep sending a small fraction
of the requests to the backend:

	sub vcl_hash {
		if (random() > .001) {	/* random returns [0...1] */
			req.hash.ignore_ttl = true
		}
	}

And then keep monitoring the backend statistics and return to normal
mode once it looks good.

The really interesting thing about this concept, is that an
administrator could implement multiple levels of degraded mode:

	Normal
	Send only 50% of traffic to backend when needed.
	Send only 25% of traffic to backend when needed.
	Send only 10% of traffic to backend when needed.
	Send only 1% of traffic to backend when needed.
	Send only 0.1% of traffic to backend when needed.

Anyway, I have dinner to cook and a conference to organize...

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.