Different cache policy for googlebot

Stewart Robinson stewsnooze at gmail.com
Thu Dec 2 11:22:33 CET 2010


Hi,

I think you could match on the Google bot string in vcl_hash and set a
different hash key and with that set long cache times e.t.c but this would
be essentially splitting your cache in half for google and not google and I
really don't think that is a good idea as it lowers the number of items you
can store.

Isn't a better temporary option to log into Google Webmaster tools and slow
the crawl down. It is valid for 90 days after setting so it should give you
breathing room. I assume this is a Drupal site as I've seen you at Drupal
events. Could you also make a special settings.php setting that enables
boost just for Googlebot so it doesn't crawl old articles again at any load?

Stewart Robinson.

On 2 December 2010 10:08, David Turner <david at collaborativebusiness.co.uk>wrote:

> I have been digging around the documentation and wiki to see if this has
> been done before, it seems not so it might just be a bad idea...
>
> I'm working on a site that has a large number of dynamic pages. Googlebot
> is going to town spidering everything in sight and we need to get it under
> control in the short-term while we address the underlying performance.
>
> The content on the pages needs to be displayed to humans with a short cache
> time, but for Googlebot we wouldn't mind caching much more aggressively.
>
> So my thought was to manage the cache such that if anyone other than
> googlebot requested a page that we process it normally with a reasonable TTL
> and update the cache. But if Googlebot requests a page, determined by the
> agent string, we try to serve the page from the cache if it's available
> (even if it's stale) and otherwise fetch from the backend and update as
> normal.
>
> Aside from this maybe being a bad idea, I'm not sure how efficiently this
> could be implemented with Varnish. The reason for trying to handle all this
> in Varnish is that we can't easily make changes to the underlying CMS to
> handle this.
>
> Is this a good or bad idea? And at what point in the varnish pipeline is it
> most efficient to handle this?
>
>
>
> --
> David M Turner <david at collaborativebusiness.co.uk>
> Collaborative Business Services Ltd
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> http://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20101202/b22b6053/attachment-0003.html>


More information about the varnish-misc mailing list