multi-terabyte caching

Tollef Fog Heen tfheen at
Fri Nov 27 10:15:03 CET 2009

]] Eric Bowman 


| I'm considering using Varnish to handle caching for a mapping
| application.  After reading
|, it seems like
| Varnish is maybe not a good choice for this.  In short I need to cache
| something like 500,000,000 files that take up about 2TB of storage.
| Using more 1975 technologies, one of the challenges has been how to
| distribute these across the file system without putting too many files
| per directory.  We have a solution we kind of like, and there are others
| out there.

Hashing on the file name should solve this easily enough, or maybe even
better, hash on the hash of the file name, so you have «somefile» where
the md5sum of the file name is c21641b4fc25d6d558bf130659d56811.  Given
how md5 works, and say you want to end up with about 1000 files per
directory, you need four or five levels of hashing, so that file would
live in c/2/1/6/4/somefile.  Five levels give you an average of 476
files per directory.

(You can of course use another hash than md5, and it's fine to use it
here since we only do it to get a good distribution, not because of any
kind of security requirements.)

| My impression is that we would start to put a big strain on Varnish and
| the OS using it in the standard way.  But maybe I'm wrong.  Or, is there
| a way to plugin a backend to manage this storage, without getting into
| the vm-thrash from which Squid suffers?

We use a hash internally already, so assuming you make the hash size,
for instance 39916801 (a prime number that's not too far from 1/10 of
your total number of objects), it should work.  Or you could use -h
critbit instead, which should scale better, but few people use it so
far, so it might well have some bugs.

Alternatively, use a hashing load balancer in front and have a bunch of
Varnish machines each serving their part of the URL space, like David
Birdsong suggested.

It'd be interesting to hear your experiences once you get this
going. :-)

Tollef Fog Heen 
Redpill Linpro -- Changing the game!
t: +47 21 54 41 73

More information about the varnish-misc mailing list