Is LCI on the radar?

Mon Jun 27 17:45:33 CEST 2011

On 27 June 2011 15:41, Lukas Kahwe Smith <mls at pooteeweet.org> wrote:
>
> On 31.05.2011, at 12:58, Laurence Rowe wrote:
>
>> On 31 May 2011 10:34, Per Buer <perbu at varnish-software.com> wrote:
>> Hi
>>
>> On Tue, May 31, 2011 at 12:23 AM, Lukas Kahwe Smith <mls at pooteeweet.org> wrote:
>> Hi,
>>
>> I assume some of you have stumbled over LCI by now:
>> http://www.ietf.org/id/draft-nottingham-linked-cache-inv-00.txt
>>
>> This is actually quite interesting. For an application we are building we are looking to create an invalidation service to which the various independent frontend server applications can register and which gets notified by the backend. Of course the frontends then have to figure out which pages all need to be invalidated. The original article will be easy. Some of the category overviews will also be easy to delete. What will already get harder is invalidating all articles that reference the given article and worse yet would be if we start caching search results.
>>
>> So I am wondering if you guys are looking at LCI for a future varnish impovement and if someone has build something like this on top of varnish today already that could maybe help us here.
>>
>> I'm pretty sure this can be implemented in VCL. No need to place it on the radar. I have an upcoming blog-post describing something similar. It might get a bit hairy with all the regular expression so it might be cleaner in a module.
>>
>> I experimented with something that sounds similar. Each page set a header recording the the content item ids that were used in rendering the page. They could then be purged with a regex including any dependents id. http://dev.plone.org/collective/browser/experimental.depends/trunk/varnish.vcl
>>
>> It works when you update or delete a content item, but it can't help the case where you add a new content item and want that to appear in listing.
>
> So we are looking to implement this. However I have one question:
> How well does this perform if you start to have 100k, 1M or more objects in your varnish cache?
> Does varnish create some sort of index of all headers? I assume even if it did, it cant really leverage it with a regexp.
>
> Just dont want to kill my varnish servers CPU/harddrive when I start to purge stuff. If someone could give me some indication of what to expect it would be very good. We will in the end of course have to do our own benchmarks, but it would be good to be able to control expectations :)
>
> I guess in the long run if one would want to properly implement LCI it would be necessary to maybe use an sqlite DB and some inline C magic to parse the relevant headers in there and then use that for lookups when doing a PURGE.
>
> Does anyone have an idea how much effort it would be to properly implement LCI in Varnish and how we could maybe organize funding among all interested parties?

This type of purge (which in Varnish 3 is renames 'ban') adds the
expression to the ban list. When any object is found in the cache the
expressions in the ban list are checked to decide whether to call
vcl_hit or vcl_miss. To prevent the ban list getting too long another
thread periodically works its way through all objects in the cache
removing those that have been banned and updating the pointer to the
place in the ban list it has checked so on subsequent requests fewer
ban expressions need to be checked.

I've not done any benchmarking on this, but for me even thousands of
regular expression checks will be much faster than re-requesting the
page from the CMS. So with my Varnish config, a PURGE request is cheap
(it only results in adding an entry to the ban list, not checking
against the entire contents of the cache) and the additional cost of
checking each GET request is not noticeable. So I don't really see the
need for LCI support given the existing support for bans.

Laurence