Best practice for not caching content requested by crawlers

Damon Snyder damon at huddler-inc.com
Thu Jul 26 02:49:04 CEST 2012


Hi Lasse,
Correct me if I'm wrong, but vcl_miss is not available in varnish 2.1
(suggested here<https://www.varnish-software.com/static/book/Cache_invalidation.html#naming-confusion>).
Is there a varnish 2.x approach that would improve the response times?

As an aside, our content is very broad-- there is a LOT of it. Its unlikely
that serialization would be a concern for the bots unless multiple bots
happend to hit content simultaneously that wasn't currently hot.

That being said, we are exploring some form of caching for the bots. The
rules/ttls should probably look different than our normal traffic.

Thanks for the followup. I really appreciate it.

Damon

On Wed, Jul 25, 2012 at 4:17 AM, Lasse Karstensen <
lasse.karstensen at gmail.com> wrote:

> Damon Snyder:
> > Hi Lasse,
> > Thanks! I forgot to mention this in the original email, but we are using
> > varnish 2.1.5. Here is what I ended up doing:
> > sub vcl_fetch {
> >     ...
> >     if (req.http.User-Agent ~
> >
> "(?i)(msn|google|bing|yandex|youdao|exa|mj12|omgili|flr-|ahrefs|blekko)bot"
> > ||
> >         req.http.User-Agent ~
> >
> "(?i)(magpie|mediapartners|sogou|baiduspider|nutch|yahoo.*slurp|genieo)") {
> >         set beresp.http.X-Bot-Bypass = "YES";
> >         set beresp.ttl = 0s;
> >         return (pass);
> >     }
> >     ...
> > }
>
> Hi Damon.
>
> Just a quick note; doing this check in vcl_fetch will lead to serialisation
> of backend requests. This will hurt your HTTP response times, and since
> these
> bots take response time into account, probably also hurt your search engine
> visibility.
>
> I'd advice you to do this test in vcl_miss, and also not override
> beresp.ttl
> so that Varnish stores the hit_for_pass object for a while.
>
> If you need to set the debug header you can store it temporarily in
> req.http.x-bot-bypass and check/set resp.http.x-bot-bypass in vcl_deliver.
>
> --
> Lasse Karstensen
> Varnish Software AS
> http://www.varnish-software.com/
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20120725/f2fb9a91/attachment.html>


More information about the varnish-misc mailing list