Using Varnish to Proxy 1000s of different sites

Michael Alger varnish at mm.quex.org
Fri Aug 6 04:14:54 CEST 2010


On Thu, Aug 05, 2010 at 12:21:07PM -0400, Tony Primerano wrote:
> On Thu, Aug 5, 2010 at 10:50 AM, Per Buer <perbu at varnish-software.com>wrote:
> 
> > On Thu, Aug 5, 2010 at 4:33 PM, Tony Primerano <tony.primerano at gmail.com>
> > wrote:
> >
> > > But what if I have 1000s of backends and I choose them based on
> > > the domain that user's hit varnish with.   Is this something
> > > Varnish handles or is it only intended to work with a handful of
> > > backends?
> >
> > It's not built for that.  Kristians dns director might help you out
> > a bit, but it just entered trunk (will be in 2.1.4).
> 
> http://kristianlyng.wordpress.com/2010/08/02/varnish-backend-selection-through-dns/
> 
> Looks promising but in the end I think I end up with the same amount of
> code. (a vcl_fetch with 1000s of ifs)

I don't think there's any avoiding that unless there's some kind of
logical, programmable pattern for mapping the requested host to the
destination host. The examples you gave sound fairly arbitrary, which
just means you'll have to have a great big list.

How many actual servers (as opposed to domains) are serving as
destinations? If it's still in the thousands, then you're probably
better off looking at a forward-proxy for your solution, although as
mentioned you could put Varnish in front of it to provide caching.

If the number of servers you connect to is more reasonable, then it
might be plausible to use varnish directly. The Host: header can be
changed by VCL and is independent of the backend definition.

The main performance issue you'd have, I think, is from the sheer number
of rewrite rules you'd have to have. This is especially the case if
you're having to run a string comparison or regular expression match
over every rule until you find a match. If you control the DNS for the
proxied domain, you can mitigate this a bit by spreading them across
multiple IP addresses.

For a better solution, I think you'd want to have something that can
look up the requested domain name using a hash table or similar, so you
get essentially constant-time lookups. I don't know of anything that can
do that "out of the box", but you could do it with Varnish using inline
C. It might be worthwhile doing a small-scale test with regular VCL for
the rewrites on a few dozen domains as a proof-of-concept, then if
you're happy with the results look at creating a more scalable rewriter.




More information about the varnish-misc mailing list