cache empties itself?

Fri Apr 4 22:27:34 CEST 2008

Sascha Ottolski wrote:
> Am Freitag 04 April 2008 18:11:23 schrieb Michael S. Fischer:
>   
>> Ah, I see.
>>
>> The problem is that you're basically trying to compensate for a
>> congenital defect in your design: the network storage (I assume NFS)
>> backend.  NFS read requests are not cacheable by the kernel because
>> another client may have altered the file since the last read took
>> place.
>>
>> If your working set is as large as you say it is, eventually you will
>> end up with a low cache hit ratio on your Varnish server(s) and
>> you'll be back to square one again.
>>
>> The way to fix this problem in the long term is to split your file
>> library into shards and put them on local storage.
>>
>> Didn't we discuss this a couple of weeks ago?
>>     
>
> exactly :-) what can I see, I did analyze the logfiles, and learned that 
> despite the fact that a lot of the access are truly random, there is 
> still a good amount of the request concentrated to a smaller set of the 
> images. of course, the set is changing over time, but thats what a 
> cache can handle perfectly.
>
> and my experiences seem to prove my theory: if varnish keeps running 
> like it is now for about 18 hours *knock on wood*, the cache hit rate 
> is close to 80 %! and that takes so much pressure from the backend that 
> the overall performance is just awesome.
>
> putting the files on local storage just doesn't scales well. I'm more 
> thinking about splitting the proxies like discussed on the list before: 
> a loadbalancer could distribute the URLs in a way that each cache holds 
> it's own share of the objects.
>   
By putting intermediate caches between the file storage and the client,
you are essentially just spreading the storage locally between cache
boxes, so if this method doesn't scale then you are still in need of a
design change, and frankly so am I :)
What you need to model is the popularity curve for your content, if your
images do not fit with an 80/20 rule of popularity, ie. 20% of your
images soak up less than 80% or requests, then you will spend more time
thrashing the caches than serving the content, and Michael is right, you
would be better served to dedicate web servers with local storage and
shard your images across them.  If 80% of your content is rarely viewed,
then using the same amount of hardware defined as caching accelerators,
you will see an increase in throughput due to more hardware serving a
smaller number of images.  It all depends on your content and users
viewing habits.

--Dave