r5577 - trunk/varnish-cache/doc/sphinx/tutorial

perbu at varnish-cache.org perbu at varnish-cache.org
Sun Nov 21 21:49:15 CET 2010


Author: perbu
Date: 2010-11-21 21:49:15 +0100 (Sun, 21 Nov 2010)
New Revision: 5577

Added:
   trunk/varnish-cache/doc/sphinx/tutorial/cookies.rst
   trunk/varnish-cache/doc/sphinx/tutorial/esi.rst
   trunk/varnish-cache/doc/sphinx/tutorial/purging.rst
   trunk/varnish-cache/doc/sphinx/tutorial/vary.rst
Log:
Split up the hitrate chapter into four and added a introduction to ESI. ESI needs a bit of work wrt params and operational factors.

Added: trunk/varnish-cache/doc/sphinx/tutorial/cookies.rst
===================================================================
--- trunk/varnish-cache/doc/sphinx/tutorial/cookies.rst	                        (rev 0)
+++ trunk/varnish-cache/doc/sphinx/tutorial/cookies.rst	2010-11-21 20:49:15 UTC (rev 5577)
@@ -0,0 +1,64 @@
+.. _tutorial-cookies:
+
+Cookies
+-------
+
+Varnish will not cache a object comming from the backend with a
+Set-Cookie header present. Also, if the client sends a Cookie header,
+Varnish will bypass the cache and go directly to the backend.
+
+This can be overly conservative. A lot of sites use Google Analytics
+(GA) to analyse their traffic. GA sets a cookie to track you. This
+cookie is used by the client side java script and is therefore of no
+interest to the server. 
+
+For a lot of web application it makes sense to completly disregard the
+cookies unless you are accessing a special part of the web site. This
+VCL snipplet in vcl_recv will disregard cookies unless you are
+accessing /admin/.::
+
+  if ( !( req.url ~ ^/admin/) ) {
+    unset req.http.Cookie;
+  }
+
+Quite simple. If, however, you need to do something more complicated,
+like removing one out of several cookies, things get
+difficult. Unfornunatly Varnish doesn't have good tools for
+manipulating the Cookies. We have to use regular expressions to do the
+work. If you are familiar with regular expressions you'll understand
+whats going on. If you don't I suggest you either pick up a book on
+the subject, read through the *pcrepattern* man page or read through
+one of many online guides.
+
+Let me show you what Varnish Software uses. We use some cookies for
+Google Analytics tracking and similar tools. The cookies are all set
+and used by Javascript. Varnish and Drupal doesn't need to see those
+cookies and since Varnish will cease caching of pages when the client
+sends cookies we will discard these unnecessary cookies in VCL. 
+
+In the following VCL we discard all cookies that start with a
+underscore.::
+
+  // Remove has_js and Google Analytics __* cookies.
+  set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_a-z]+|has_js)=[^;]*", "");
+  // Remove a ";" prefix, if present.
+  set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
+
+Let me show you an example where we remove everything the the cookies
+named COOKIE1 and COOKIE2 and you can marvel at it.::
+
+  sub vcl_recv {
+    if (req.http.Cookie) {
+      set req.http.Cookie = ";" req.http.Cookie;
+      set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
+      set req.http.Cookie = regsuball(req.http.Cookie, ";(COOKIE1|COOKIE2)=", "; \1=");
+      set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
+      set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
+
+      if (req.http.Cookie == "") {
+          remove req.http.Cookie;
+      }
+  }
+
+The example is taken from the Varnish Wiki, where you can find other
+scary examples of what can be done i VCL.

Added: trunk/varnish-cache/doc/sphinx/tutorial/esi.rst
===================================================================
--- trunk/varnish-cache/doc/sphinx/tutorial/esi.rst	                        (rev 0)
+++ trunk/varnish-cache/doc/sphinx/tutorial/esi.rst	2010-11-21 20:49:15 UTC (rev 5577)
@@ -0,0 +1,86 @@
+.. _tutorial-esi:
+
+Edge Side Includes
+------------------
+
+*Edge Side Includes* is a language to include *fragments* of web pages
+in other web pages. Think of it as HTML include statement that works
+over HTTP. 
+
+On most web sites a lot of content is shared between
+pages. Regenerating this content for every page view is wasteful and
+ESI tries to address that lettting you decide the cache policy for
+each fragment individually.
+
+In Varnish we've only implemented a small subset of ESI. As of 2.1 we
+have three ESI statements:
+
+ * esi:include 
+ * esi:remove
+ * <!--esi ...-->
+
+Content substitution based on variables and cookies is not implemented
+but is on the roadmap. 
+
+Example: esi include
+~~~~~~~~~~~~~~~~~~~~
+
+Lets see an example how this could be used. This simple cgi script
+outputs the date:::
+
+     #!/bin/sh
+     
+     echo 'Content-type: text/html'
+     echo ''
+     date "+%Y-%m-%d %H:%M"
+
+Now, lets have an HTML file that has an ESI include statement:::
+
+     <HTML>
+     <BODY>
+     The time is: <esi:include src="/cgi-bin/date.cgi"/>
+     at this very moment.
+     </BODY>
+     </HTML>
+
+For ESI to work you need to activate ESI processing in VCL, like this:::
+
+    sub vcl_fetch {
+    	if (req.url == "/test.html") {
+           esi;        		     /* Do ESI processing		*/
+           set obj.ttl = 24 h; 	     /* Sets the TTL on the HTML above  */
+    	} elseif (req.url == "/cgi-bin/date.cgi") {
+           set obj.ttl = 1m;         /* Sets a one minute TTL on	*/
+	       	       	 	     /*  the included object		*/
+        }
+    }
+
+Example: esi remove
+~~~~~~~~~~~~~~~~~~~
+
+The *remove* keyword allows you to remove output. You can use this to make
+a fallback of sorts, when ESI is not available, like this:::
+
+  <esi:include src="http://www.example.com/ad.html"/> 
+  <esi:remove> 
+    <a href="http://www.example.com">www.example.com</a>
+  </esi:remove>
+
+Example: <!--esi ... -->
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+
+This is a special construct to allow HTML marked up with ESI to render
+without processing. ESI Processors will remove the start ("<!--esi")
+and end ("-->") when the page is processed, while still processing the
+contents. If the page is not processed, it will remain, becoming an
+HTML/XML comment tag. For example::
+
+  <!--esi  
+  <p>Warning: ESI Disabled!</p>
+  </p>  -->
+
+This assures that the ESI markup will not interfere with the rendering
+of the final HTML if not processed.
+
+

Added: trunk/varnish-cache/doc/sphinx/tutorial/purging.rst
===================================================================
--- trunk/varnish-cache/doc/sphinx/tutorial/purging.rst	                        (rev 0)
+++ trunk/varnish-cache/doc/sphinx/tutorial/purging.rst	2010-11-21 20:49:15 UTC (rev 5577)
@@ -0,0 +1,119 @@
+.. _tutorial-purging:
+
+Purging and banning
+-------------------
+
+One of the most effective way of increasing your hit ratio is to
+increase the time-to-live (ttl) of your objects. But, as you're aware
+of, in this twitterific day of age serving content that is outdated is
+bad for business.
+
+The solution is to notify Varnish when there is fresh content
+available. This can be done through two mechanisms. HTTP purging and
+bans. First, let me explain the HTTP purges. 
+
+
+HTTP Purges
+~~~~~~~~~~~
+
+An HTTP purge is similar to a HTTP GET request, except that the
+*method* is PURGE. Actually you can call the method whatever you'd
+like, but most people refer to this as purging. Squid supports the
+same mechanism. In order to support purging in Varnish you need the
+following VCL in place:::
+
+  acl purge {
+	  "localhost";
+	  "192.168.55.0/24";
+  }
+  
+  sub vcl_recv {
+      	  # allow PURGE from localhost and 192.168.55...
+
+	  if (req.request == "PURGE") {
+		  if (!client.ip ~ purge) {
+			  error 405 "Not allowed.";
+		  }
+		  return (lookup);
+	  }
+  }
+  
+  sub vcl_hit {
+	  if (req.request == "PURGE") {
+	          # Note that setting ttl to 0 is magical.
+                  # the object is zapped from cache.
+		  set obj.ttl = 0s;
+		  error 200 "Purged.";
+	  }
+  }
+  
+  sub vcl_miss {
+	  if (req.request == "PURGE") {
+
+		  error 404 "Not in cache.";
+	  }
+  }
+
+As you can see we have used to new VCL subroutines, vcl_hit and
+vcl_miss. When we call lookup Varnish will try to lookup the object in
+its cache. It will either hit an object or miss it and so the
+corresponding subroutine is called. In vcl_hit the object that is
+stored in cache is available and we can set the TTL.
+
+So for vg.no to invalidate their front page they would call out to
+Varnish like this:::
+
+  PURGE / HTTP/1.0
+  Host: vg.no
+
+And Varnish would then discard the front page. If there are several
+variants of the same URL in the cache however, only the matching
+variant will be purged. To purge a gzip variant of the same page the
+request would have to look like this:::
+
+  PURGE / HTTP/1.0
+  Host: vg.no
+  Accept-Encoding: gzip
+
+Bans
+~~~~
+
+There is another way to invalidate content. Bans. You can think of
+bans as a sort of a filter. You *ban* certain content from being
+served from your cache. You can ban content based on any metadata we
+have.
+
+Support for bans is built into Varnish and available in the CLI
+interface. For VG to ban every png object belonging on vg.no they could
+issue:::
+
+  purge req.http.host == "vg.no" && req.http.url ~ "\.png$"
+
+Quite powerful, really.
+
+Bans are checked when we hit an object in the cache, but before we
+deliver it. An object is only checked against newer bans. If you have
+a lot of objects with long TTL in your cache you should be aware of a
+potential performance impact of having many bans.
+
+You can also add bans to Varnish via HTTP. Doing so requires a bit of VCL.::
+
+  sub vcl_recv {
+	  if (req.request == "BAN") {
+                  # Same ACL check as above:
+		  if (!client.ip ~ purge) {
+			  error 405 "Not allowed.";
+		  }
+		  purge("req.http.host == " req.http.host 
+		        "&& req.url == " req.url);
+
+		  # Throw a synthetic page so the
+                  # request wont go to the backend.
+		  error 200 "Ban added"
+	  }
+  }
+
+This VCL sniplet enables Varnish to handle a HTTP BAN method. Adding a
+ban on the URL, including the host part.
+
+

Added: trunk/varnish-cache/doc/sphinx/tutorial/vary.rst
===================================================================
--- trunk/varnish-cache/doc/sphinx/tutorial/vary.rst	                        (rev 0)
+++ trunk/varnish-cache/doc/sphinx/tutorial/vary.rst	2010-11-21 20:49:15 UTC (rev 5577)
@@ -0,0 +1,58 @@
+.. _tutorial-vary:
+
+Vary
+~~~~
+
+The Vary header is sent by the web server to indicate what makes a
+HTTP object Vary. This makes a lot of sense with headers like
+Accept-Encoding. When a server issues a "Vary: Accept-Encoding" it
+tells Varnish that its needs to cache a separate version for every
+different Accept-Encoding that is coming from the clients. So, if a
+clients only accepts gzip encoding Varnish won't serve the version of
+the page encoded with the deflate encoding.
+
+The problem is that the Accept-Encoding field contains a lot of
+different encodings. If one browser sends::
+
+  Accept-Encodign: gzip,deflate
+
+And another one sends::
+
+  Accept-Encoding:: deflate,gzip
+
+Varnish will keep two variants of the page requested due to the
+different Accept-Encoding headers. Normalizing the accept-encoding
+header will sure that you have as few variants as possible. The
+following VCL code will normalize the Accept-Encoding headers.::
+
+    if (req.http.Accept-Encoding) {
+        if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
+            # No point in compressing these
+            remove req.http.Accept-Encoding;
+        } elsif (req.http.Accept-Encoding ~ "gzip") {
+            set req.http.Accept-Encoding = "gzip";
+        } elsif (req.http.Accept-Encoding ~ "deflate") {
+            set req.http.Accept-Encoding = "deflate";
+        } else {
+            # unkown algorithm
+            remove req.http.Accept-Encoding;
+        }
+    }
+
+The code sets the Accept-Encoding header from the client to either
+gzip, deflate with a preference for gzip.
+
+Pitfall - Vary: User-Agent
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Some applications or application servers send *Vary: User-Agent* along
+with their content. This instructs Varnish to cache a separate copy
+for every variation of User-Agent there is. There are plenty. Even a
+single patchlevel of the same browser will generate at least 10
+different User-Agent headers based just on what operating system they
+are running. 
+
+So if you *really* need to Vary based on User-Agent be sure to
+normalize the header or your hit rate will suffer badly. Use the above
+code as a template.
+




More information about the varnish-commit mailing list