Let GoogleBot Crawl full content, reverse DNS lookup

Richard Chiswell richard.chiswell at mangahigh.com
Mon Mar 7 16:08:03 CET 2011


On 07/03/2011 14:58, Lane, Richard wrote:
>
> I am looking into supporting Google’s “First Click Free for Web 
> Search”. I need to allow the GoogleBots to index the full content of 
> my sites but still maintain the Registration wall for everyone else. 
> Google suggests that you detect there GoogleBots by reverse DNS lookup 
> of the requesters IP.
>
> Google Desc: 
> http://www.google.com/support/webmasters/bin/answer.py?answer=80553
>
> Has anyone done DNS lookups via VCL to verify access to content or to 
> cache content?

I believe this /could/ be done using a C function, but it's not 
something I've had experience of before.

What you could do is detect the Google user-agent in varnish, and then 
pass that and the IP to a backend script with the original request: such as
/* Varnish 2.0.6 psuedo code - may need updating */
if (req.http.user-agent == "Googlebot") {
     set.http.x-varnish-originalurl = req.url;
     set req.url = "/googlecheck?ip= " client.ip "&originalurl=" req.url;
     lookup;
}
and the Googlecheck script actually does the rDNS look up and if it 
matches, it returns the contents of the requested url.

Richard Chiswell
http://www.mangahigh.com
(Speaking personally yadda yadda)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20110307/4d450d6b/attachment-0003.html>


More information about the varnish-misc mailing list