Monitor per site in Varnish
jhalfmoon at milksnot.com
Wed Oct 10 18:09:44 CEST 2012
On 09/28/2012 12:02 AM, Johan Olsson wrote:
> I've been looking on how to monitor varnish. I've found that there
> exists a snmp for varnish which gives some info that is good to have.
> I've found it and looked at it
> (http://sourceforge.net/projects/varnishsnmp/), but it dosen't give
> all that I need (I think). What I'm missing is to be able to monitor
> how much traffic one site is using. So if I have two sites like
> www.example1.com <http://www.example1.com> and www.example2.com
> <http://www.example2.com>, I would like to be able to get how many
> connections each one gets and how much Mbps each one is using.
> Is this possible to do?
Maybe I can help. I've got about 35 sites running on a 4 node Varnish
cluster here and monitor throughput, request rate and http status codes
per-site using Cacti and Nagios via SNMP . The way it works is like this:
Each server is running the exact same varnish config. In this config
there's a VCL chunk that defines a bunch of macro's for gathering site info:
STATS_NODE - Define a new node. This generates a structure at compile
time, where the statistics will be stored.
STATS_INIT - Initialize a node. Unfortunately this gets called each time
a site is accessed. But the code is only a few lines and very lightweight.
STATS_SET_BACKEND - Defines the current backend to use. This is called
each time a site is accessed.
STATS_UPDATE - Update the site's statistics. This is called each time a
site is accessed.
STATS_DUMP - This gets called periodically to dump the entire statistics
linked-list to a syslog.
The flow is roughly as follows:
- Each site references the macros at certain points to generate the
- The main configuration calls STATS_DUMP periodically, which sends
statistics info to syslog;
- Syslog then sends it to a dedicated FIFO;
- A script (called varnish-snmp-stats-prep-backends.sh) is listening to
the FIFO and parses the stats;
- The stats are then parsed and written to a per-site textfile;
- SNMPD is configured to access the per-site stats files.
Each server also generates varnishd-specific data every 5 minutes using
a script (varnish-snmp-stats-prep-srv.sh) that calls varnishstat. The
parsed output of varnish is dumped to a text file and made available to
One varnish server is appointed the main statistics server. On that
server a cronjob calls "varnish-snmp-summarize-backends.py" every 5
minutes, which gathers and summarizes the statistics of all 4 servers,
using SNMP. This data is then dumped to per-site text files again, but
then containing the aggregate per-site counts. Cacti can the query this
server for the combined per-site and varnishd statistics.
Another approach to generate the per-site statistics could be to pipe
the varnishlog output to a script that parses this data. Though I fear
this method might cause quite a heavy load on the machine doing the
parsing, so this may have to be offloaded to another machine. But this
is not the path I chose.
Note: We're still running varnish 2 in production. Version 3 is in test.
But the conversion is trivial. I've prepared a tarball of this setup for
sharing, but I have to get permission to release this (anonimized)
configuration to the public. I'll get back to you on this subject
tomorrow or the day after to hopefully supply you with the entire setup
(varnish, cron, support scripts, syslog). Just let me know the best way
to share this on this list.
And somewhat unrelated to your question, but interresting nonetheless:
Another bit of VCL code dumps each request tot syslog in a modified NSCA
format, for debugging, traceability and such. But because the machines
sometimes generate beyond 2MB of logdata per second per server, and I
like to keep the logs for a few weeks, the logs need to be rotateted
fairly often to prevent gigantic files and they need to be compressed to
minimize storage requirements.There are two seperate scripts to handle
- varnish-log-rotate.sh - Checks the size of the log and rotates it if
it excceeds 2GB.
- varnish-log-compress.sh - Waits for rotated logs and compresses and
archives them using idle-priority, to minimize CPU impact.
This allows you to store 2.5TB of logs on 250GB of storage and minimize
the log compression load on the servers.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the varnish-misc