Strategies for splitting load across varnish instances? And avoiding single-point-of-failure?

Sat Jan 16 03:01:45 CET 2010

On Fri, Jan 15, 2010 at 5:33 PM, pub crawler <pubcrawler.com at gmail.com> wrote:
>> At first glance, this is doing something that you can more cheaply and efficiently do at a higher level, with >software dedicated to that purpose.  It's interesting, but I'm not sure it's more than just a restatement of the >same solution with it's own problems.
>
> Varnish performs very well.  Extending this to have a cluster
> functionality within Varnish I think just makes sense.  The workaround
> solutions so far seem to involve quite a bit of hardware as well as
> having a miss rate of 50% in example of 2 Varnish instances.  Sure it
> can hot populate fast, but it's two stacks of memory wasted for the
> same data per se.  I suppose a custom solution to hash the inbound
> requests somehow and determine which Varnish should have the data can
> be done, but unsure if anyone now is doing that.
>

I'm doing it on a decent scale.  This makes the most sense.  Let a
load balancer load balance and a cache server cache.

>> F5/NetScaler is quite expensive, but they have significant functionality, too.
>>
>> The hardware required to run LVS/haproxy (for example) can be very cheap -- Small RAM, 1-2 CPU cores > per ethernet interface.  When you're already talking about scaling out to lots of big-RAM/disk Varnish
>> boxes, the cost of a second load balancer is tiny, and the benefit of redundancy is huge.
>
> F5 has always made good gear.  The price point limits adoption to deep
> pockets.  I am not convinced that most people need a hardware
> balancing solution.  They have their limited adoption, and the N+1
> purchase amounts - 2 minimum, 3 more optimally = $$$$$$.
>
>> Squid has a peering feature; I think if you had ever tried it you would know why it's not a fabulous idea. :) > It scales terribly.  Also, Memcache pooling that I've seen scale involves logic in the app (a higher level).
>
> Squid is a total disaster.  If it wasn't none of us would be here
> using Varnish now would we :)  It's amazing Squid even works at this
> point.
>
> The memcached pooling is a simple formula really - it's microsecond
> fast - yes typically done on the client:
> Most standard client hashing within memcache clients uses a simple
> modulus calculation on the value against the number of configured
> memcached servers. You can summarize the process in pseudocode as:
>
> @memcservers = ['a.memc','b.memc','c.memc'];
> $value = hash($key);
> $chosen = $value % length(@memcservers);
> Replacing the above with values:
>
> @memcservers = ['a.memc','b.memc','c.memc'];
> $value = hash('myid');
> $chosen = 7009 % 3;
> In the above example, the client hashing algorithm will choose the
> server at index 1 (7009 % 3 = 1), and store or retrieve the key and
> value with that server.

yes and from my experience, each client api has a different hashing
implementation.  you set a key on a pool of servers in php and you
cannot retrieve the value for that key python.  if you set a key in
python, you can't retrieve it in the nginx module(it follows php
iirc.)

>
>> Varnish as a pool/cluster also doesn't provide redundancy to the client interface.
why should it?  it's pretty busy dealing with memory management and io
heavy threaded craziness.

in it's current form, it doesn't care from where the request is coming
from and it shouldn't.  it should try to reach into it's cache or
fetch from a backend.

>>
>> A distributed Varnish cache (or perhaps a memcache storage option in Varnish?) is really interesting; it might be scalable, but not obviously.  It also doesn't eliminate the need for a higher-level balancer.
>>
>
> Well, in this instance, Varnish can do the modulus math versus the not
> Varnish servers in config pool. Wouldn't take any sort of time and the
> logic already seems to exist in the VCL config to work around when a
> backend server can be reached.  Same logic could be adapted to the
> "front side" to try connecting to other Varnish instances and doing
> the failover dance as needed.
>
i love that it doesn't do this.  i can debug haproxy separately, i can
debug varnish sepately, i can debug nginx separately.  they're all
great tools.

> I put in a feature request this evening for this functionality.  We'll
> see what the official development folks think.  If it can't be
> included in the core, then perhaps a front-end Varnish proxy is in
> order developmentally.  I'll say this akin to Moxi in front of
> memcached instances: http://labs.northscale.com/moxi/

i vote for big no on this request.

if this theoretical varnish performed a hash of a url and determined
that it should fetch from a remote *peer* varnish and not from it's
own cache file(or backend for that matter), then it would presumably
fetch from a different machine, while a different uri would hash to
the local cache and fetch from itself. so you'd have traffic going in
and out of varnish ports talking to each other before you've even done
a backend fetch.

the logic of all this seems quite tortured.

what you're describing is a load balancer(or a proxy.)  haproxy is
free and awesome, it does consistent hashing which last i checked a
netscaler wont even do.  you can add and detract servers all day and
your url hashing doesn't clobber itself.  sure you'll need more
servers, but HA doesn't come for free -just close to it with open
source.

>
> I think tying Varnish into Memcached is fairly interesting as it
> appears the market is allocating many resources towards memcached.  At
> some point I believe memcached will become at least an unofficial
> standard for fast memory based storage. There a number of
> manufacturers making custom higher performance memcached solutions -
> Gear6 and Schooner come to mind foremost.
>
> That's my $1 worth :)

sure, tie varnish, memcached, netscaler, nginx all together and
customize the architecture to fit your needs. i'm currently working on
a way to populate memcache with the hottest of hot files that i want
to serve above the varnish pool.

varnish is a supremely cool server because devels wrote it to perform
a single function very well and left it up to us admins to glue it
together with other systems we need and use.

> _______________________________________________
> varnish-misc mailing list
> varnish-misc at projects.linpro.no
> http://projects.linpro.no/mailman/listinfo/varnish-misc
>