Different cache policy for googlebot

David Turner david at collaborativebusiness.co.uk
Thu Dec 2 11:29:20 CET 2010


I agree that it's not a good idea to split the cache. The way I see it, the human traffic will need to keep the cache updated so that pages can be returned from there when Googlebot comes. (Cache misses for Googlebot would still fetch from the backend.)

We have slowed Googlebot down, and maybe that's the best solution at the end of the day. My worry with that is that the site is growing quickly so by slowing Google down, it has an older view of the whole site whereas my proposal is to let it keep coming but let it hold an older version of each page.

Unfortunately not a Drupal site otherwise I'd have been looking there first! ;o)


On 2 Dec 2010, at 10:22, Stewart Robinson wrote:

> Hi,
> 
> I think you could match on the Google bot string in vcl_hash and set a different hash key and with that set long cache times e.t.c but this would be essentially splitting your cache in half for google and not google and I really don't think that is a good idea as it lowers the number of items you can store.
> 
> Isn't a better temporary option to log into Google Webmaster tools and slow the crawl down. It is valid for 90 days after setting so it should give you breathing room. I assume this is a Drupal site as I've seen you at Drupal events. Could you also make a special settings.php setting that enables boost just for Googlebot so it doesn't crawl old articles again at any load?
> 
> Stewart Robinson.

-- 
David M Turner <david at collaborativebusiness.co.uk>
Collaborative Business Services Ltd





More information about the varnish-misc mailing list