ESI and search engine spiders

Rob S rtshilston at gmail.com
Tue Aug 10 22:05:48 CEST 2010


Hi,

On one site we run behind varnish, we've got a "most popular" widget 
displayed on every page (much like http://www.bbc.co.uk/news/).  
However, we have difficulties where this pollutes search engines, as 
searches for a specific popular headline tend not to link directly to 
the article itself, but to one of the index pages with high Google 
pagerank or similar.

What I'd like to know is how other Varnish users might have served 
different ESI content based on whether it's a bot or not.

My initial idea was to set an "X-Not-For-Bots: 1" header on the URL that 
generates the most-popular fragment, then do something like (though 
untested):

sub vcl_deliver {
    if (req.http.header.user-agent ~ "bot" && resp.http.X-Not-For-Bots 
== "1") {
       error 752 "Not for bots";
    } else {
       deliver;
    }
}
...
sub vcl_error {
    if (obj.status==750) {
        set obj.status = 200;
        synthetic {"<!-- not for bots -->"};
        deliver;
    }
}

However, such an approach doesn't work as the req object isn't available 
in vcl_deliver.  We'd prefer to use a header such as X-Not-For-Bots or 
similar, rather than hard-coding a list of ESI fragments to be 
suppressed from bots into Varnish.

Has anyone any good suggestions?



Rob




More information about the varnish-misc mailing list