Varnish performance tips on linux

Tue Apr 29 16:12:25 CEST 2008

I've been dealing with a few varnish problems the last months on
linuxboxes

I do not have writeaccess to the wiki, so I'll write my notes down
here. Problems first, then how i fixed some of them

There might be some ugly stuff here and even something that are
just plain wrong. It works for me, I'd love feedback

Problems I've run into

* Crashes when cache runs full (1.1.2)
* performance issues
* CPU in IO-wait -> High Load ->
* Slower cache Thread Pileup under heavy load -> Load > 60

Crashes when cache runs full
----------------------------
My fix: Upgade to trunk. There has been a lot of fixes in trunk to
deal with this. Trunk is usually pretty stable with some
exceptions (I'm running varnish-1.1.2-trunk2543)

Network performance issues
--------------------------
My fix: Tune kernel parameters.
Linux has a increasing number of tcp autotuning parameters
especially after 2.6.17. But you still want to look into the following

net.ipv4.ip_local_port_range = 1024 65536
#Defines the local port range that is used by TCP and UDP to choose
#the local port. The first number is the first, the second the last
#local port number. The default value depends on the amount of memory
#available on the system: > 128MB 32768 - 61000, < 128MB 1024 - 4999
#or even less This number defines number of active connections, which
#this system can issue simultaneously to systems not supporting TCP
#extensions (timestamps). With tcp_tw_recycle enabled, range 1024 -
#4999 is enough to issue up to 2000 connections per second to systems
#supporting timestamps.

net.core.rmem_max=16777216
#This setting changes the maximum network receive buffer

net.core.wmem_max=16777216
#The same thing for the send buffer

net.ipv4.tcp_rmem=4096 87380 16777216
#This sets the kernel's minimum, default, and maximum TCP receive
#buffer sizes. You might be surprised, seeing the maximum of 16M,
#that many Unix-like operating systems still have a maximum of 256K!

net.ipv4.tcp_wmem=4096 65536 16777216
#A similar setting for the TCP send buffer. Note that the
#default value is a little lower. Don't worry about this,
#the send buffer size is less important than the
#receive buffer.

net.ipv4.tcp_fin_timeout = 3
#Time to hold socket in state FIN-WAIT-2,
#if it was closed by our side. Peer can be broken and never close its
#side, or even die unexpectedly. The default value is 60 seconds.
#Usual value used in 2.2 was 180 seconds, you may restore it, but
#remember that if your machine is even underloaded web server, you risk
#to overflow memory with lots of dead sockets. FIN-WAIT-2 sockets are
#less dangerous than FIN-WAIT-1, because they eat maximum 1.5 kilobytes
#of memory, but they tend to live longer.

net.ipv4.tcp_tw_recycle = 1
#Allow to reuse TIME-WAIT sockets for new connections when it is safe
#from protocol viewpoint. The default value is 0

net.core.netdev_max_backlog = 30000
#Maximum number of packets, queued on the input side, when the
#interface receives packets faster than
#kernel can process them. Applies to non-NAPI devices only. The default
#value is 1000.

net.ipv4.tcp_no_metrics_save=1
#This removes an odd behavior in the 2.6 kernels,
#whereby the kernel stores the slow start threshold for a
#client between TCP sessions. This can cause undesired results, as a
#single period of congestion can affect many subsequent connections. I
#recommend that you disable it.

net.core.somaxconn = 262144
#Limit of socket listen() backlog, known in userspace as SOMAXCONN.
#Defaults to 128. See also tcp_max_syn_backlog for additional
#tuning for TCP sockets.

net.ipv4.tcp_syncookies = 0
# Send out syncookies when the syn backlog queue of a socket overflows.
# This is to prevent against the common "syn flood attack".
# Disabled (0) by default. And should stay disabled.

net.ipv4.tcp_max_orphans = 262144
#Maximal number of TCP sockets not attached to any user file handle,
#held by system. If this number is exceeded orphaned connections are
#reset immediately and warning is printed.

net.ipv4.tcp_max_syn_backlog = 262144
#Maximal number of remembered connection requests, which still did
#not receive an acknowledgment from connecting client.
#The default value is 1024 for systems with more than 128 MB of memory,
#and 128 for low memory machines.

net.ipv4.tcp_synack_retries = 2
#Number of times SYNACKs for a passive TCP connection attempt
#will be retransmitted. Should not be higher than 255.
#The default value is 5, which corresponds to ~ 180 seconds.

net.ipv4.tcp_syn_retries = 2
#Number of times initial SYNs for an active TCP connection attempt
#will be retransmitted. Should not be higher than 255.
#The default value is 5, which corresponds to ~ 180 seconds.

CPU in IO-wait
--------------
For some reason, newer kernels (>2.6.9 atleast) are much more aggressive
in writing the varnish mmap'ed datafile down to disk.
On older kernels a "iostat -x 1" would give you an allmost idling disk
If "iostat -x 1 gives you up to 100% usage in the last column, you
have vm-io-trouble. I've tried to ajust alot of the settings under vm
and disk, but I can't seem to get it to behave like old linux kernels
(Newer FreeBSD-kernels seem to have a similar problem

My fix: Use The malloc backend instead of mmap'ed datafile.
To do this you have to add a pretty large swap, just trash the
filesystem you used to store you datafile on and give it to swap
and add "-s malloc,30G" for a "datafile" of 30G on swap.

High Load, slow backends, Thread Pile-up
---------------------------------------
In my case I have 3 varnishes talking to the same backend for a specific
service. Each varnish serving 4000 hits/s, So about 12000 hits/s. Most
requests are hitting the same page.

The backend is pretty quick, answering in about 10 milliseconds, but that
is unfortunalty too slow in some cases.

When a client asks for a object that is new or gone stale, varnish
queues up the requests while it retrieves the object from the backend.
During the 10 milliseconds it takes to retrieve the object from the
backend 40 new requests is queued up waiting for the same object using
up a thread.  So what happens if the backend at some point use up to
maybe 1 full second to answer. Thread pile-up.
It's a bit tricky tosolve this, but we have some options.  
The first one is to enable object grace period, this allows
varnish to serve stale, but cacheable object to clients while
it retrives a new object from the backend.

set obj.grace = 30s;

at the top of both vcl_recv and vcl_fetch

this helps alot for stale objects, but its still a problem for new
object not yet in the cache. There is unfortunately not a good way of
dealing with this except "warming" up the cache by sending a request of
the object before you "publish" the object.

--
Audun

*****************************************************************
Denne fotnoten bekrefter at denne e-postmeldingen ble
skannet av MailSweeper og funnet fri for virus.
*****************************************************************
This footnote confirms that this email message has been 
swept by MailSweeper for the presence of computer viruses.
*****************************************************************