Proposal/specs for backend conditional requests / aka "GET If-Modified-Since" (GET IMS))

Mon Sep 27 21:27:14 CEST 2010

For persistant storage, just ignore the TTL and throw away the segment  
with the oldest object, refreshed or not.

I am of the opinion that if a method exists to verify the object, LM  
or Etag,  we shouldn't ever expire it. The ttl is just a setting for  
when we should refresh it. Of course, standard LRU should still apply.

I am also less worried about the reader/writer scenario for the  
headers, since by spec you shouldnt' update any headers that aren't  
Expires/Cache-Control (and weirdly enough, Vary)

Artur

On Sep 27, 2010, at 6:50 AM, Nils Goroll wrote:

> Hi,
>
> I'd like to add a brief update to the following section summarizing my
> understanding after talking to phk today, who seems to be really  
> busy and
> probably will not find time to respond before the weekend:
>
>> To allow multiple cache objects to share body data, we want to add
>> reference counters to struct storage following the example of the
>> existing implementation for objects (HSH_Ref(), HSH_Unref() etc).
>
> Though I still believe this should be pretty straight forward for  
> all other
> storages, it won't be for -spersistent. After studying the code for  
> an hour or
> so, my understanding is the following:
>
> Persistent storage segments the cache (see
> http://www.varnish-cache.org/trac/wiki/ 
> ArchitecturePersistentStorage) and won't
> re-use segments for new objects unless they are completely empty (no  
> live
> objects). Right now, this relies on the LRU and TTL based expiry to  
> eventually
> clean out segments before running out of space. Having multiple refs  
> to the same
> obj in persistent storage (and updating it again and again) would  
> effectively
> lead to more and more segments being kept from becoming empty.
>
> I believe what is really needed is additional space management for the
> persistent storage. In a first step, when running short of storage,  
> objects
> could get nuked from the smallest segment. In a second step, the  
> mechanics to
> copy live objects from one segment to another could be implemented.  
> Ideally,
> this could be vcl controlled ("should we rather nuke the object or  
> bother
> copying it?"). But I see some complications for both, mainly that  
> storage would
> need to know which objects are referencing it in order to update  
> those (sounds
> wrong).
>
> As long as we don't have any of this, I suggest two alternative  
> temporary solutions:
>
> a) If an object getting refreshed lives in persistent storage, we'll  
> simply copy
> it. Actually, the existing Rackspace implementation does this. This  
> is far from
> optimal, but won't make much of a difference for small objects and  
> is still much
> more efficient than re-fetching the object from backend like today,  
> so we
> shouldn't see any performance regression.
>
> For other stevedores, we'll use the reference counter.
>
> b) Add reference counters to persistent storage, too, and simply  
> live with the
> cache fragmentation issue. Those using persistent storage would be  
> advised not
> to use cache refresh.
>
> At this point, I'd favor a).
>
>
> Please note that all of this is my personal understanding. I am  
> posting these
> thoughts in the hope that my understanding is correct and I'd really  
> appreciate
> corrections if it's not.
>
> Thank you, Nils
>
> _______________________________________________
> varnish-dev mailing list
> varnish-dev at varnish-cache.org
> http://lists.varnish-cache.org/mailman/listinfo/varnish-dev