Proposal/specs for backend conditional requests / aka "GET If-Modified-Since" (GET IMS))

Nils Goroll slink at schokola.de
Mon Sep 27 15:50:22 CEST 2010


Hi,

I'd like to add a brief update to the following section summarizing my
understanding after talking to phk today, who seems to be really busy and
probably will not find time to respond before the weekend:

> To allow multiple cache objects to share body data, we want to add
> reference counters to struct storage following the example of the
> existing implementation for objects (HSH_Ref(), HSH_Unref() etc).

Though I still believe this should be pretty straight forward for all other
storages, it won't be for -spersistent. After studying the code for an hour or
so, my understanding is the following:

Persistent storage segments the cache (see
http://www.varnish-cache.org/trac/wiki/ArchitecturePersistentStorage) and won't
re-use segments for new objects unless they are completely empty (no live
objects). Right now, this relies on the LRU and TTL based expiry to eventually
clean out segments before running out of space. Having multiple refs to the same
obj in persistent storage (and updating it again and again) would effectively
lead to more and more segments being kept from becoming empty.

I believe what is really needed is additional space management for the
persistent storage. In a first step, when running short of storage, objects
could get nuked from the smallest segment. In a second step, the mechanics to
copy live objects from one segment to another could be implemented. Ideally,
this could be vcl controlled ("should we rather nuke the object or bother
copying it?"). But I see some complications for both, mainly that storage would
need to know which objects are referencing it in order to update those (sounds
wrong).

As long as we don't have any of this, I suggest two alternative temporary solutions:

a) If an object getting refreshed lives in persistent storage, we'll simply copy
it. Actually, the existing Rackspace implementation does this. This is far from
optimal, but won't make much of a difference for small objects and is still much
more efficient than re-fetching the object from backend like today, so we
shouldn't see any performance regression.

For other stevedores, we'll use the reference counter.

b) Add reference counters to persistent storage, too, and simply live with the
cache fragmentation issue. Those using persistent storage would be advised not
to use cache refresh.

At this point, I'd favor a).


Please note that all of this is my personal understanding. I am posting these
thoughts in the hope that my understanding is correct and I'd really appreciate
corrections if it's not.

Thank you, Nils




More information about the varnish-dev mailing list