[PATCH] Random director tries all backends before giving up

Jack Lindamood jack at facebook.com
Tue Apr 13 00:11:54 CEST 2010


Thanks for the feedback.  While not an issue in my case, a configuration parameter that limits the number of backends to try could be useful for others.  I don't know how most people use varnish, but potentially triggering vcl_error when a single backend shuts down is probably undesirable behavior for most users.

From: varnish-dev-bounces at varnish-cache.org [mailto:varnish-dev-bounces at varnish-cache.org] On Behalf Of Adrian Otto
Sent: Sunday, April 11, 2010 7:51 PM
To: varnish-dev at varnish-cache.org
Subject: Re: [PATCH] Random director tries all backends before giving up

Jack,

This approach is probably not a good idea if (a) you have a large cluster, (b) a heavily loaded cluster, and/or (c) if your backends are sensitive to overload. You are likely to trigger a cascading failure. It might be smarter to have a configurable number of backends to try... perhaps 2 or 3. Imagine if you have 50 backends. There is no point in trying 50 times to find a healthy backend. Changes are that if 25% of your backends are down, trying more is just going to exacerbate the problem.

Adrian

On Apr 11, 2010, at 4:35 PM, Jack Lindamood wrote:


The following is a patch I've made to varnish that I hope improves the random director: which anyone's welcome to use (even varnish trunk?).  My motivation was to reduce the number of vcl_error calls when a director is mostly good.  You can get the entire patch at this link.

http://github.com/cep21/Varnish/commit/6f5e98143ac2636504d9febf574b14c3c1a072fc

Here's the commit message:

Random director tries all backends before giving up

Summary:
The current random director gives up when it can't get a FD to the backend it wants retries times in a row.  Rather than give up and return NULL, which is guaranteed to cause a vcl_error, as a last ditch effort we try all other healthy backends until we get one that works.  This is mostly useful in the between time after a backend server dies and before the health check fails enough to mark a backend unhealthy.

Backwards Compatibility =  Not strictly backwards compatible.  In cases when the old code would of fallen through to vcl_error this will give a shot at getting a good result.

Performance = In the worse case, this will add extra calls for getting a FD, but only for situations that vcl_error

Test Plan: New varnish unittest.  It fails in the old code and works in this new code.


_______________________________________________
varnish-dev mailing list
varnish-dev at varnish-cache.org<mailto:varnish-dev at varnish-cache.org>
http://lists.varnish-cache.org/mailman/listinfo/varnish-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20100412/44d1a37c/attachment-0003.html>


More information about the varnish-dev mailing list