cache empties itself?

Fri Apr 4 18:38:12 CEST 2008

Am Freitag 04 April 2008 18:11:23 schrieb Michael S. Fischer:
> On Fri, Apr 4, 2008 at 3:20 AM, Sascha Ottolski <ottolski at web.de> 
wrote:
> >  you are right, _if_ the working set is small. in my case, we're
> > talking 20+ mio. small images (5-50 KB each), 400+ GB in total
> > size, and it's growing every day. access is very random, but there
> > still is a good amount of "hot" objects. and to be ready for a
> > larger set it cannot reside on the webserver, but lives on a
> > central storage. access performance to the (network) storage is
> > relatively slow, and our experiences with mod_cache from apache
> > were bad, that's why I started testing varnish.
>
> Ah, I see.
>
> The problem is that you're basically trying to compensate for a
> congenital defect in your design: the network storage (I assume NFS)
> backend.  NFS read requests are not cacheable by the kernel because
> another client may have altered the file since the last read took
> place.
>
> If your working set is as large as you say it is, eventually you will
> end up with a low cache hit ratio on your Varnish server(s) and
> you'll be back to square one again.
>
> The way to fix this problem in the long term is to split your file
> library into shards and put them on local storage.
>
> Didn't we discuss this a couple of weeks ago?

exactly :-) what can I see, I did analyze the logfiles, and learned that 
despite the fact that a lot of the access are truly random, there is 
still a good amount of the request concentrated to a smaller set of the 
images. of course, the set is changing over time, but thats what a 
cache can handle perfectly.

and my experiences seem to prove my theory: if varnish keeps running 
like it is now for about 18 hours *knock on wood*, the cache hit rate 
is close to 80 %! and that takes so much pressure from the backend that 
the overall performance is just awesome.

putting the files on local storage just doesn't scales well. I'm more 
thinking about splitting the proxies like discussed on the list before: 
a loadbalancer could distribute the URLs in a way that each cache holds 
it's own share of the objects.

Cheers, Sascha