Purging. Now what?

Gaute Amundsen gaute at pht.no
Tue Nov 14 09:44:22 CET 2006


I am well aware of what purging is intended for.
I guess at VG, the journalists dont mind of they post a story, and it takes 5 
minutes to appear on the frontpage? Or more likely you can predict what will 
change, and purge that.

Thing is, we have a whole bunch of CMS sites on different domains, and the 
customers are not all so tech-savy, so telling them "sometimes it may take 5 
minutes", would only create confusion.
Furthermore, the sites all differ enough that it is not simple to predict, 
what pages will be affectd by an uppdate. However, few of them would have 
very many pages in the cache at any time, so that flushing them all is not a 
problem. Flushing all the pages of all the customers everytime one of them 
saves the least little thing, IS a problem, since that would happen often 
enough during "prime time" to make our sites appear to be unpredictable, and 
fickle.

Is that any clearer?

In tecnical terms, as I tried to ask before, I need to:
1 ) be able to purge by domain on the console
2 ) purge by patern match by HTTP PURGE 
Or if neither of those are possible
3 ) keep track of the urls in the cache, with or without the help of varnish, 
so that I can purge them by HTTP PURGE.
Beeing able to prefetch would be the payoff for the hassle of the last 
alternative I guess :)

I know the docs says that 2 can't be done.
An authoritative confirmation that 1 can't be either would be helpfull, as 
that would let me consentrate on 3.

A way to list what is beeing cached at any one time would be really helpfull 
both in implementing 3 of course, but allso in getting the headers configured 
right for all the different pieces of code that live on our servers.

The information is in the logs, I know, I only find it a bit cumbersome to 
work with. If I end up building some small log-tailers to assist me in this, 
would that be in line with the intentions of the architecture do you think?

Gaute.

On Monday 13 November 2006 21:53, Anders Berg wrote:
> Hi Gaute,
>
> and thanks for the script examples in earlier posts.
>
> With regards to your question about purging I must admit that I am
> unsure what you are trying to achieve. Could we be talking corner
> case here?
> I will try to explain why and how purging is used in general, so
> please don't be offended if I state the obvious or can't see your
> challange or what I am saying is to trivial.
>
> Purging is not used to control content expiration, the HTTP headers
> like Expires and max-age etc. do that. Purging is used as a "oh-shit-
> have-to-delete-now" mechanism. Let's say you have a default cache
> time for your article/page on 5 min., but 5 min. is to long to wait
> for a update if there is important stuff that needs to replace the
> content in the cache. That's when you would use purging to "force" an
> update on that/those page(s). If it's only one page, go directly for
> that page (no reg-exp), if there are more that one you have to keep
> control over what pages need to be refreshed or use a URL schema so
> that a reg-exp purge will delete them.
>
> Does this answer your question? Please explain a bit deeper, with
> examples, if this does not.
>
> Anders Berg
> Sys.adm
> VG Nett // www.vg.no
>
> > Date: Sun, 12 Nov 2006 16:47:52 +0100
> > From: Gaute Amundsen <gaute at pht.no>
> > Subject: Purging. Now what?
> > To: varnish-misc at projects.linpro.no
> > Message-ID: <200611121647.54547.gaute at pht.no>
> > Content-Type: text/plain;  charset="iso-8859-1"
> >
> > Now that I can trigger a purge when a customer presses "save" in
> > our CMS, the
> > next step is trying to do it somewhat smarter...
> >
> > Purging everything from all hosts in the cache is simple via
> > telnet, but a bit
> > brutish. It could get noticable as well, with maybe 50 customers
> > saving
> > through the day..
> >
> > Purging one url at a time is more presice, but then I have to keep
> > track of
> > what to purge. Finding all urls in a site is not very efficient,
> > and 95% of
> > those would not be in the cache  anyway.
> >
> > I could build a small daemon to tail the access logs, and keep a
> > running
> > buffer of recently accessed pages. Then I could easily prefetch
> > pages as
> > well, after purging them. But it does not feel quite right this
> > either...
> > Sort of buliding a shadow copy of varnish timeout mechanism.
> >
> > Any good suggestions?
> >
> > Regards
> > Gaute Amundsen
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at projects.linpro.no
> http://projects.linpro.no/mailman/listinfo/varnish-misc



More information about the varnish-misc mailing list