multi-terabyte caching

David Birdsong david.birdsong at
Sat Nov 21 00:05:11 CET 2009

On Fri, Nov 20, 2009 at 2:19 PM, Eric Bowman <ebowman at> wrote:
> Hi,
> Apologies if this has been hashed out before.  I did some googling, and
> read the faq, but I could have been more thorough... ;)
> I'm considering using Varnish to handle caching for a mapping
> application.  After reading
>, it seems like
> Varnish is maybe not a good choice for this.  In short I need to cache
> something like 500,000,000 files that take up about 2TB of storage.
> Using more 1975 technologies, one of the challenges has been how to
> distribute these across the file system without putting too many files
> per directory.  We have a solution we kind of like, and there are others
> out there.
> My impression is that we would start to put a big strain on Varnish and
> the OS using it in the standard way.  But maybe I'm wrong.  Or, is there
> a way to plugin a backend to manage this storage, without getting into
> the vm-thrash from which Squid suffers?
> Thanks for any advice -- Varnish gets such good press I'd really love if
> it were straightforward to use it in this case.
> -Eric
a straight forward way to store an unlimited amount of data is to find
the optimal cache storage capacity per varnish instance then:

optimal_size  / working_set = N

where N is the number of varnish instances you need to run.

then put a layer 7 switch in front of the pool of varnish instances,
hashing on the requests.

works like a charm.

finding optimal storage amount per varnish requires turning the knobs:
 - tuning VM
 - tuning kernel for high network traffic
 - balancing between big and fast storage medium
    random reads will skyrocket, minimize writing to storage while
serving if possible (pregenerate your working set, dont let anything
expire between generating )
 ..and test

> Eric Bowman
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at

More information about the varnish-misc mailing list