Rewriting served URLs dependent on user agent

Fri Mar 1 04:26:51 CET 2013

On Thu, Feb 28, 2013 at 8:33 PM, Ian Evans <dheianevans at gmail.com> wrote:

> I've been looking at this site's discussion of how they're handling
> the traffic loss caused by Google's redesign of their image search.
>
>
> http://pixabay.com/en/blog/posts/hotlinking-protection-and-watermarking-for-google-32/

[...]

Is there a way that Varnish could cache two versions of the page?
>
> One, human visitors would get the cached page with the?i
> Two, robot user agents would get a cached version where Varnish would
> strip all the ?i from urls.
>
> Is that possible? Thanks for any pointers.
>

Yes. On vcl_recv you can detect if it's a bot and do anything on the URL,
like:

sub vcl_recv {
  if (req.http.User-Agent ~ "googlebot") {
    regsub(req.http.url, "\?i", "");
  }

  ...

  return (lookup);
}

This will tell Varnish to strip the "?i" only when the HTTP header
User-Agent contains "googlebot".

Since vcl_recv is executed before any cache lookup, it'll store two
different caches (when missed): one for the url "image.jpg?i" and other for
"image.jpg".

[]'s
Hugo
www.devin.com.br
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20130301/0cca22a4/attachment.html>