<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
On 07/03/2011 14:58, Lane, Richard wrote:
<blockquote cite="mid:C99A4EA0.3C67B%25rlane@ahbelo.com" type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<title>Let GoogleBot Crawl full content, reverse DNS lookup</title>
<font face="Arial"><span style="font-size: 11pt;"><br>
I am looking into supporting Google’s “First Click Free for
Web Search”. I need to allow the GoogleBots to index the full
content of my sites but still maintain the Registration wall
for everyone else. Google suggests that you detect there
GoogleBots by reverse DNS lookup of the requesters IP. <br>
<br>
Google Desc: <a moz-do-not-send="true"
href="http://www.google.com/support/webmasters/bin/answer.py?answer=80553">http://www.google.com/support/webmasters/bin/answer.py?answer=80553</a><br>
<br>
Has anyone done DNS lookups via VCL to verify access to
content or to cache content?<br>
</span></font></blockquote>
<br>
I believe this /could/ be done using a C function, but it's not
something I've had experience of before.<br>
<br>
What you could do is detect the Google user-agent in varnish, and
then pass that and the IP to a backend script with the original
request: such as<br>
/* Varnish 2.0.6 psuedo code - may need updating */<br>
if (req.http.user-agent == "Googlebot") {<br>
set.http.x-varnish-originalurl = req.url;<br>
set req.url = "/googlecheck?ip= " client.ip "&originalurl="
req.url;<br>
lookup;<br>
}<br>
and the Googlecheck script actually does the rDNS look up and if it
matches, it returns the contents of the requested url.<br>
<br>
Richard Chiswell<br>
<a class="moz-txt-link-freetext" href="http://www.mangahigh.com">http://www.mangahigh.com</a><br>
(Speaking personally yadda yadda)<br>
</body>
</html>