notes1

Poul-Henning Kamp phk at phk.freebsd.dk
Fri Feb 24 13:53:08 CET 2006


Notes on Varnish
----------------

Collected 2006-02-24 to 2006-02-..

Poul-Henning Kamp

-----------------------------------------------------------------------
Policy Configuration

Policy is configured in a simple unidirectional (no loops, no goto)
programming language which is compiled into 'C' and from there binary
modules which are dlopen'ed by the main Varnish process.

The dl object contains one exported symbol, a pointer to a structure
which contains a reference count, a number of function pointers,
a couple of string variables with identifying information.

All access into the config is protected by the reference counts.

Multiple policy configurations can be loaded at the same time
but only one is the "active configuration".  Loading, switching and
unloading of policy configurations happen via the managment
process.

A global config sequence number is incremented on each switch and
policy modified object attributes (ttl, cache/nocache) are all
qualified by the config-sequence under which they were calculated
and invalid if a different policy is now in effect.

-----------------------------------------------------------------------
Configuration Language

XXX: include lines.

BNF:
	program:	function
			| program function

	function:	"sub" function_name compound_statement

	compound_statement:	"{" statements "}"

	statements:	/* empty */
			| statement
			| statements statement
			

	statement:	if_statement
			| call_statement
			| "finish"
			| assignment_statement
			| action_statement

	if_statement:	"if" condition compound_statement elif_parts else_part

	elif_parts:	/* empty */
			| elif_part
			| elif_parts elif_part

	elif_part:	"elseif" condition compound_statement
			| "elsif" condition compound_statement
			| "else if" condition compound_statement

	else_part:	/* empty */
			| "else" compound_statement

	call_statement:	"call" function_name

	assign_statement:	field "=" value

	field:		object
			field "." variable

	action_statement:	action arguments

	arguments:	/* empty */
			arguments | argument

-----------------------------------------------------------------------
Sample request policy program

	sub request_policy {

		if (client.ip in 10.0.0.0/8) {
			no-cache
			finish
		}

		if (req.url.host ~ "cnn.no$") {
			rewrite	s/cnn.no$/vg.no/
		}

		if (req.url.path ~ "cgi-bin") {
			no-cache
		}

		if (req.useragent ~ "spider") {
			no-new-cache
		}

		if (backend.response_time > 0.8s) {
			set req.ttlfactor = 1.5
		} elseif (backend.response_time > 1.5s) {
			set req.ttlfactor = 2.0
		} elseif (backend.response_time > 2.5s) {
			set req.ttlfactor = 5.0
		}

		/*
		 * the program contains no references to
		 * maxage, s-maxage and expires, so the
		 * default handling (RFC2616) applies
		 */
	}

-----------------------------------------------------------------------
Sample fetch policy program

	sub backends {
		set backend.vg.ip = {...}
		set backend.ads.ip = {...}
		set backend.chat.ip = {...}
		set backend.chat.timeout = 10s
		set backend.chat.bandwidth = 2000 MB/s
		set backend.other.ip = {...}
	}

	sub vg_backend {
		set backend.ip = {10.0.0.1-5}
		set backend.timeout = 4s
		set backend.bandwidth = 2000Mb/s
	}

	sub fetch_policy {

		if (req.url.host ~ "/vg.no$/") {
			set req.backend = vg
			call vg_backend
		} else {
			/* XXX: specify 404 page url ? */
			error 404
		}

		if (backend.response_time > 2.0s) {
			if (req.url.path ~ "/landbrugspriser/") {
				error 504
			}
		}
		fetch
		if (backend.down) {
			if (obj.exist) {
				set obj.ttl += 10m
				finish
			}
			switch_config ohhshit
		}
		if (obj.result == 404) {
			error 300 "http://www.vg.no"
		}
		if (obj.result != 200) {
			finish
		}
		if (obj.size > 256k) {
			no-cache
		} else if (obj.size > 32k && obj.ttl < 2m) {
			obj.tll = 5m				
		}
		if (backend.response_time > 2.0s) {
			set ttl *= 2.0
		}
	}

	sub prefetch_policy {

		if (obj.usage < 10 && obj.ttl < 5m) {
			fetch
		}
	}

-----------------------------------------------------------------------
Purging

When a purge request comes in, the regexp is tagged with the next
generation number and added to the tail of the list of purge regexps.

Before a sender transmits an object, it is checked against any
purge-regexps which have higher generation number than the object
and if it matches the request is sent to a fetcher and the object
purged.

If there were purge regexps with higher generation to match, but
they didn't match, the object is tagged with the current generation
number and moved to the tail of the list.

Otherwise, the object does not change generation number and is
not moved on the generation list.

New Objects are tagged with the current generation number and put
at the tail of the list.

Objects are removed from the generation list when deleted.

When a purge object has a lower generation number than the first
object on the generation list, the purge object has been completed
and will be removed.  A log entry is written with number of compares
and number of hits.
	
-----------------------------------------------------------------------
Random notes

	swap backed storage

	slowstart by config-flipping
		start-config has peer servers as backend
		once hitrate goes above limit, management process
		flips config to 'real' config.

	stat-object
		always URL, not regexp

	management + varnish process in one binary, comms via pipe

	Change from config with long expiry to short expiry, how
	does the ttl drop ?  (config sequence number invalidates
	all calculated/modified attributes.)

	Mgt process holds copy of acceptor socket ->  Restart without
	lost client requests.

	BW limit per client IP: create shortlived object (<4sec)
	to hold status.  Enforce limits by delaying responses.


-----------------------------------------------------------------------
Source structure


	libvarnish
		library with interface facilities, for instance
		functions to open&read shmem log

	varnish
		varnish sources in three classes

-----------------------------------------------------------------------
protocol cluster/mgt/varnish

object_query url -> TTL, size, checksum
{purge,invalidate} regexp
object_status url -> object metadata

load_config filename
switch_config configname
list_configs
unload_config

freeze 	# stop the clock, freezes the object store
thaw

suspend	# stop acceptor accepting new requests
resume

stop	# forced stop (exits) varnish process
start
restart = "stop;start"

ping $utc_time -> pong $utc_time

# cluster only
config_contents filename $inline -> compilation messages

stats [-mr] -> $data

zero stats

help

-----------------------------------------------------------------------
CLI (local)
	import protocol from above

	telnet localhost someport
	authentication:
		password $secret
	secret stored in {/usr/local}/etc/varnish.secret (400 root:wheel)


-----------------------------------------------------------------------
HTML (local)

	php/cgi-bin thttpd ?
	(alternatively direct from C-code.)
	Everything the CLI can do +
	stats
		popen("rrdtool");
	log view

-----------------------------------------------------------------------
CLI (cluster)
	import protocol from above, prefix machine/all
	compound stats
	accept / deny machine (?)
	curses if you set termtype

-----------------------------------------------------------------------
HTML (cluster)
	ditto
	ditto

	http://clustercontrol/purge?regexp=fslkdjfslkfdj
		POST with list of regexp
		authentication ? (IP access list)

-----------------------------------------------------------------------
Mail (cluster)

	pgp signed emails with CLI commands

-----------------------------------------------------------------------
connection varnish -> cluster controller

	Encryption
		SSL
	Authentication (?)
		IP number checks.

	varnish -c clusterid -C mycluster_ctrl.vg.no

-----------------------------------------------------------------------
Filer
	/usr/local/sbin/varnish
		contains mgt + varnish process.
		if -C argument, open SSL to cluster controller.
		Arguments:
			-p portnumber
			-c clusterid at cluster_controller
			-f config_file
			-m memory_limit
			-s kind[,storage-options]
			-l logfile,logsize
			-b backend ip...
			-d debug
			-u uid
			-a CLI_port

		KILL SIGTERM	-> suspend, stop

	/usr/local/sbin/varnish_cluster
		Cluster controller.
		Use syslog

		Arguments:
			-f config file
			-d debug
			-u uid (?)

	/usr/local/sbin/varnish_logger
		Logfile processor
		-i shmemfile
		-e regexp
		-o "/var/log/varnish.%Y%m%d.traffic" 
		-e regexp2
		-n "/var/log/varnish.%Y%m%d.exception"  (NCSA format)
		-e regexp3
		-s syslog_level,syslogfacility
		-r host:port	send via TCP, prefix hostname

		SIGHUP: reopen all files.

	/usr/local/bin/varnish_cli
		Command line tool.

	/usr/local/share/varnish/etc/varnish.conf
		default request + fetch + backend scripts

	/usr/local/share/varnish/etc/rfc2616.conf
		RFC2616 compliant handling function

	/usr/local/etc/varnish.conf (optional)
		request + fetch + backend scripts

	/usr/local/share/varnish/etc/varnish.startup
		default startup sequence

	/usr/local/etc/varnish.startup (optional)
		startup sequence

	/usr/local/etc/varnish_cluster.conf
		XXX

	{/usr/local}/etc/varnish.secret
		CLI password file.

-----------------------------------------------------------------------
varnish.startup

	load config /foo/bar startup_conf
	switch config startup_conf
	!mypreloadscript
	load config /foo/real real_conf
	switch config real_conf
	resume

*eof*



More information about the varnish-dev mailing list