Let GoogleBot Crawl full content, reverse DNS lookup
Richard Chiswell
richard.chiswell at mangahigh.com
Mon Mar 7 16:08:03 CET 2011
On 07/03/2011 14:58, Lane, Richard wrote:
>
> I am looking into supporting Google’s “First Click Free for Web
> Search”. I need to allow the GoogleBots to index the full content of
> my sites but still maintain the Registration wall for everyone else.
> Google suggests that you detect there GoogleBots by reverse DNS lookup
> of the requesters IP.
>
> Google Desc:
> http://www.google.com/support/webmasters/bin/answer.py?answer=80553
>
> Has anyone done DNS lookups via VCL to verify access to content or to
> cache content?
I believe this /could/ be done using a C function, but it's not
something I've had experience of before.
What you could do is detect the Google user-agent in varnish, and then
pass that and the IP to a backend script with the original request: such as
/* Varnish 2.0.6 psuedo code - may need updating */
if (req.http.user-agent == "Googlebot") {
set.http.x-varnish-originalurl = req.url;
set req.url = "/googlecheck?ip= " client.ip "&originalurl=" req.url;
lookup;
}
and the Googlecheck script actually does the rDNS look up and if it
matches, it returns the contents of the requested url.
Richard Chiswell
http://www.mangahigh.com
(Speaking personally yadda yadda)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20110307/4d450d6b/attachment-0003.html>
More information about the varnish-misc
mailing list