Purging. Now what?

Trond Michelsen trondmm-varnish at crusaders.no
Tue Nov 14 11:04:41 CET 2006


On Mon, Nov 13, 2006 at 09:53:39PM +0100, Anders Berg wrote:
> With regards to your question about purging I must admit that I am  
> unsure what you are trying to achieve. Could we be talking corner  
> case here?

It's possible that I have an edge case for you :)

> I will try to explain why and how purging is used in general, so  
> please don't be offended if I state the obvious or can't see your  
> challange or what I am saying is to trivial.
> 
> Purging is not used to control content expiration, the HTTP headers  
> like Expires and max-age etc. do that.

Right. But my problem is that I don't know in advance when something
expires, I only know when it has expired. 

I'm using Varnish to cache maptiles from a WMS-server. We also have to
layers of WMS-servers. First there's mapserver, which is used to draw
general maps, then there's our own WMS-server, that's used to draw
weather maps. All requests to our WMS-server passes through
mapserver. Unfortunately, mapserver does not support any content
expiration headers, like Expires or If-Modified-Since or anything like
that. Besides, if it did - how should it behave if it needs to fetch
two layers from our WMS-server, and one of them is unchanged?
Mapserver can't be expected to keep a cache of its own. 

> Purging is used as a "oh-shit- 
> have-to-delete-now" mechanism. Let's say you have a default cache  
> time for your article/page on 5 min., but 5 min. is to long to wait  
> for a update if there is important stuff that needs to replace the  
> content in the cache. That's when you would use purging to "force" an  
> update on that/those page(s).

> If it's only one page, go directly for that page (no reg-exp), if
> there are more that one you have to keep control over what pages
> need to be refreshed or use a URL schema so that a reg-exp purge
> will delete them.

Well, since we're using tiled images, the number of potential URLs
becomes quite large very fast. At any given zoomlevel, there are
4**zoomlevel tiles. So at zoomlevel 12 (which we use for Google
Earth), there are 16,7 million tiles. We have about 20 different
datalayers, and every layer contains data for every hour for 48
hours. That means that we have to purge up to 16 billion tiles when a
datamodel is updated (about every 12 hours)[1]. Since our internal
WMS-server is fairly slow, we really, really don't want to generate
any tiles more than once, so it's important for us that Varnish keeps
everything in cache until it's purged.

> Does this answer your question? Please explain a bit deeper, with  
> examples, if this does not.

I hope this explains why we'd like to purge URLs regularly with a
regex.

[1] Of course, only a tiny fraction of this will actually be generated
in the first place. We don't expect users to zoom in this close on
every part of the world, but it illustrates the potential number of
URLs that need to be purged if it can't be done with a regex.

-- 
Trond Michelsen



More information about the varnish-misc mailing list