<div dir="ltr">Hi All,<div><br></div><div style>This has been successfully deployed in production, and the code (as-is) is handling "many thousands" of connections per second from fake and legitimate bots advertising themselves as Googlebot/Bingbot/etc with no apparent issues/problems. The configuration we've deployed is essentially the same as provided here (and in the code base).</div>
<div style><br></div><div style>Anyway, if anyone else ends up finding libvmod-dns helpful, please consider it "emailware" -- ie, drop me an email and let me know (off-the-record, of course) how you're making use of it. I'm curious more than anything!</div>
<div style><br></div></div><div class="gmail_extra"><br clear="all"><div><br>-Ken</div>
<br><br><div class="gmail_quote">On Mon, Apr 1, 2013 at 6:21 PM, Kenneth Shaw <span dir="ltr"><<a href="mailto:kenshaw@gmail.com" target="_blank">kenshaw@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Hi,<div><br></div><div>I spent a bit of time today developing a DNS module for Varnish. </div><div><br></div><div>It is available here:</div><div><br></div><div><a href="https://github.com/kenshaw/libvmod-dns/" target="_blank">https://github.com/kenshaw/libvmod-dns/</a></div>
<div><br></div><div>The reason for this development is to cut off bots that abuse the User-Agent string (ie, claiming to be Googlebot/bingbot/etc.) by doing a reverse and then forward DNS against the client.ip/X-Forwarded-For header and comparing with a regex against the resultant domain. </div>
<div><br></div><div>The logic is meant to work something like this:</div><div><br></div><div>sub vcl_recv {</div><div><div> # do a dns check on "good" crawlers</div><div> if (req.http.user-agent ~ "(?i)(googlebot|bingbot|slurp|teoma)") {</div>
<div> # do a reverse lookup on the client.ip (X-Forwarded-For) and check that its in the allowed domains</div><div> set req.http.X-Crawler-DNS-Reverse = dns.rresolve(req.http.X-Forwarded-For);</div><div><br>
</div><div> # check that the RDNS points to an allowed domain -- 403 error if it doesn't</div><div> if (req.http.X-Crawler-DNS-Reverse !~ "(?i)\.(googlebot\.com|search\.msn\.com|crawl\.yahoo\.net|ask\.com)$") {</div>
<div> error 403 "Forbidden";</div><div> } </div><div><br></div><div> # do a forward lookup on the DNS</div><div> set req.http.X-Crawler-DNS-Forward = dns.resolve(req.http.X-Crawler-DNS-Reverse);</div>
<div><br></div><div> # if the client.ip/X-Forwarded-For doesn't match, then the user-agent is fake </div><div> if (req.http.X-Crawler-DNS-Forward != req.http.X-Forwarded-For) {</div><div> error 403 "Forbidden";</div>
<div> } </div><div> } </div></div><div><div>}</div><div><br></div><div>While this is not being used in production (yet), I plan to do so later this week against a production system receiving ~10,000+ requests/sec. I will report back afterwards.</div>
<div><br></div><div>I realize the code currently has issues (memory, documentation, etc.), which will be fixed in the near future.</div><div><br></div><div>I also realize there are better ways to head malicious bots off at the pass through DNS, etc (which we are doing as well). The largest issue here for my purposes is that it is difficult / impossible to identify all traffic. Additionally, it is nice to be able to monitor the actual traffic coming through and not completely dropping it at the edge.</div>
<div><br></div><div>Any input/comments against what I've written so far would be gladly appreciated! Thanks!</div><div><br>-Ken</div>
</div></div>
</blockquote></div><br></div>