Now that Varnish is up and running, and you can access your web application through Varnish. Unless your application is specifically written to work behind a web accelerator you’ll probably need to do some changes to either the configuration or the application in order to get a high hit rate in Varnish.
Varnish will not cache your data unless it’s absolutely sure it is safe to do so. So, for you to understand how Varnish decides if and how to cache a page I’ll guide you through a couple of tools that you will find useful.
Note that you need a tool to see what HTTP headers fly between you and the web server. If you have Varnish the easiest is to use varnishlog and varnishtop but sometimes a client-side tool makes sense. Here are the ones I use.
You can use varnishtop to identify what URLs are hitting the backend the most. varnishtop -i txurl is a essential command. You can see some other examples of varnishtop usage in Statistics.
When you have identified the an URL which is frequently sent to the backend you can use varnishlog to have a look at the whole request. varnishlog -c -o /foo/bar will give the whole (-o) requests coming from the client (-c) matching /foo/bar.
lwp-request is part of The World-Wide Web library for Perl. It’s couple of really basic programs that can execute a HTTP request and give you the result. I mostly use two programs, GET and HEAD.
vg.no was the first site to use Varnish and the people running Varnish there are quite clue-full. So its interesting to look at their HTTP Headers. Lets send a GET request for their home page.:
$ GET -H 'Host: www.vg.no' -Used http://vg.no/
GET http://vg.no/
Host: www.vg.no
User-Agent: lwp-request/5.834 libwww-perl/5.834
200 OK
Cache-Control: must-revalidate
Refresh: 600
Title: VG Nett - Forsiden - VG Nett
X-Age: 463
X-Cache: HIT
X-Rick-Would-Never: Let you down
X-VG-Jobb: http://www.finn.no/finn/job/fulltime/result?keyword=vg+multimedia Merk:HeaderNinja
X-VG-Korken: http://www.youtube.com/watch?v=Fcj8CnD5188
X-VG-WebCache: joanie
X-VG-WebServer: leon
OK. Let me explain what it does. GET usually send off HTTP 0.9 requests, which lack the Host header. So I add a Host header with the -H option. -U print request headers, -s prints response status -e prints repsonse headers and -d discards the actual content. We dont really care about the content, only the headers.
As you can see VG ads quite a bit of information in their headers. Some of the headers, like the X-Rick-Would-Never are specific to vg.no and their somewhat odd sense of humour. Others, like the X-VG-Webcache are for debugging purposes.
So, to check whether a site sets cookies for a specific URL just do:
GET -Used http://example.com/ |grep ^Set-Cookie
There is also a plugin for Firefox. Live HTTP Headers can show you what headers are beeing sent and recieved. Live HTTP Headers can be found at https://addons.mozilla.org/en-US/firefox/addon/3829/ or by googling “Live HTTP Headers”.
Along with each HTTP request and reponse comes a bunch of headers carrying metadata. Varnish will look at these headers to determine if it is appropriate to cache the contents and how long Varnish can keep the content.
Please note that when considering these headers Varnish actually considers itself part of the actual webserver. The ratinonale being that both are under your control.
The term surrogate origin cache is not really well defined by the IETF so RFC 2616 so the various ways Varnish works might differ from your expectations.
Lets take a look at the importent headers you should be aware of:
The Cache-Control instructs caches how to handle the content. Varnish cares about the max-age parameter and uses it to calculate the TTL for an object.
“Cache-Control: nocache” is ignored but if you need this you can easyli add support for it.
So make sure use issue a Cache-Control header with a max-age header. You can have a look at what Varnish Softwares drupal server issues::
$ GET -Used http://www.varnish-software.com/|grep ^Cache-Control
Cache-Control: public, max-age=600
Varnish adds a Age header to indicate how long the object has been kept inside Varnish. You can grep out Age from varnishlog like this:
varnishlog -i TxHeader -I ^Age
HTTP 1.0 server might send “Pragma: nocache”. Varnish ignores this header. You could easly add support for this header in VCL.
In vcl_fetch:
if (beresp.http.Pragma ~ "nocache") {
pass;
}
If Varnish sees a Authorization header it will pass the request. If this is not what you want you can unset the header.
Sometimes your backend will misbehave. It might, depending on your setup, be easier to override the ttl in Varnish then to fix your somewhat cumbersome backend.
You need VCL to identify the objects you want and then you set the beresp.ttl to whatever you want.:
sub vcl_fetch {
if (req.url ~ "^/legacy_broken_cms/") {
set beresp.ttl = 5d;
}
}
Some sites are accessed via lots of hostnames. http://www.varnish-software.com , http://varnish-software.com and http://varnishsoftware.com/ all point at the same site. Since Varnish doesn’t know they are different Varnish will cache different versions of every page for every hostname. You can mitigate this in your web server configuration by setting up redirects or by using the following VCL::
if (req.http.host ~ "^(www.)?varnish-?software.com") {
set req.http.host = "varnish-software.com";
}
The following chapters should give your ways of further increasing your hitrate, especially the chapter on Cookies.