Different cache policy for googlebot

David Turner david at collaborativebusiness.co.uk
Thu Dec 2 11:08:53 CET 2010


I have been digging around the documentation and wiki to see if this has been done before, it seems not so it might just be a bad idea...

I'm working on a site that has a large number of dynamic pages. Googlebot is going to town spidering everything in sight and we need to get it under control in the short-term while we address the underlying performance.

The content on the pages needs to be displayed to humans with a short cache time, but for Googlebot we wouldn't mind caching much more aggressively.

So my thought was to manage the cache such that if anyone other than googlebot requested a page that we process it normally with a reasonable TTL and update the cache. But if Googlebot requests a page, determined by the agent string, we try to serve the page from the cache if it's available (even if it's stale) and otherwise fetch from the backend and update as normal.

Aside from this maybe being a bad idea, I'm not sure how efficiently this could be implemented with Varnish. The reason for trying to handle all this in Varnish is that we can't easily make changes to the underlying CMS to handle this.

Is this a good or bad idea? And at what point in the varnish pipeline is it most efficient to handle this?



-- 
David M Turner <david at collaborativebusiness.co.uk>
Collaborative Business Services Ltd



More information about the varnish-misc mailing list