Tuning varnish for high load

Tue Mar 4 10:53:13 CET 2008

Are you sending one request per connection and closing it, or are you
serving a number of requests to 10K different connections? In the last
case how many requests/sec are you seeing?

I have ran a small test against 1.1.2 release on a 8-cpu (dual
quad-core) system with 4GB RAM running Debian etch, and for objects
larger than a few KB, the gigabit link is the bottleneck. However, most
of our requests are for very small files and I am hitting some other
limitation that I haven't yet figured out fully.
First of all, when running one request per connection, I hit the flood
protection in the switch so I haven't gotten around to testing this
properly yet, but for what it's worth the varnish is chugging along at
7K requests/s or thereabouts in this situation.
When bumping up the number of requests per connection, I am seeing 17K
reqs/sec
After turning some knobs on the system, mainly
net.core.somaxconn
net.core.netdev_max_backlog
net.ipv4.tcp_max_syn_backlog

It jumps to ~25K reqs/sec
At this point, it refuses to go any higher even if I tune the sysctl
settings higher, run the benchmark from more client machines or even add
another network card.
load is 1.1,  CPU usage in top is 30% (no single core is used 100%
either) and there is no I/O wait, so unless there is something obvious I
have missed, this is as fast as the system goes.

Moving away from the synthetic tests to some real world observations
however, things become a bit different. In production, with 98% hit rate
varnish sometimes become flaky around 6-7K requests/sec because of
resource leaks it seems. The virtual memory usage suddenly jumps to 80G
or more, the varnish stops serving new requests and the child restarts.
This might have been plugged in trunk, but all later revisions I have
tried have had other problems and have been really unstable for me.

-Henning

On Fri, 2008-02-29 at 12:23 -0800, Michael S. Fischer wrote:
> On Thu, Feb 28, 2008 at 9:52 PM, Mark Smallcombe <mark at funnyordie.com> wrote:
> 
> >  What tuning recommendations do you have for varnish to help it handle high load?
> 
> Funny you should ask, I've been spending a lot of time with Varnish in
> the lab.  Here are a few observations I've made:
> 
> (N.B.  We're using 4-CPU Xeon hardware running RHEL 4.5, which runs
> the 2.6.9 Linux kernel.  All machines have at least 4GB RAM and run
> the 64-bit Varnish build, but our results are equally applicable to
> 32-bit builds)
> 
> - When the cache hit ratio is very high (i.e. 100%), we discovered
> that Varnish's default configuration of thread_pool_max is too high.
> When there are too many worker threads, Varnish spends an inordinate
> amount of time in system call space.  We're not sure whether this is
> due to some flaw in Varnish, our ancient Linux kernel (we were unable
> to test with a modern 2.6.22 or later kernel that apparently has a
> better scheduler), or is just a fundamental problem when a threaded
> daemon like Varnish tries to service thousands of concurrent
> connections.  After much tweaking we determined that, on our hardware,
> the optimal ratio of threads per CPU is about 16, or around 48-50
> threads on a 4-CPU box.  To eliminate dropping work requests, it is
> also advisable to raise overflow_max to a significantly higher ratio
> than the default (e.g. 10000%).  This will cause Varnish to consume
> somewhat more RAM, but will provide outstanding performance.  With
> these tweaks, we were able to get Varnish to serve 10,000 concurrent
> connections, flooding a Gigabit Ethernet channel with 5 KB cached
> objects.
> 
> - Conversely, when the cache hit ratio is 0, the default of 100
> threads is too low. (To create this scenario, we used 2 Varnish boxes:
>  the front-end proxy was configured to "pass" all requests to an
> optimized backend Varnish instance that served all requests from
> cache.)  On the same 4-CPU hardware, we found that the optimal
> thread_pool_max value in this situation is about 750.  Again, we were
> able to serve 10,0000 concurrent connections after optimizing the
> settings.
> 
> I find this interesting, because one would think that Varnish would be
> making the system spend much more time in the scheduler in the second
> scenario because it is doing significantly less work (no lookups, just
> handing off connections to the appropriate backend).  I suspect that
> there may be some thread-scalability issues with the cache lookup
> process.   If someone with a suitably powerful lab setup (i.e. Gigabit
> Ethernet, big hardware) can test with a more modern Linux kernel, I'd
> be very interested in the results.  Feel free to contact me if you
> need assistance with setup/analysis.
> 
> Finally: Varnish performance is absolutely atrocious on a 8-CPU RHEL
> 4.5 system -- so bad that I have to turn down thread_pool_max to 4 or
> restrict it to run only on 4 CPUs via taskset(1).  I've heard that
> MySQL has similar problems, so I suspect that this is a Linux kernel
> issue.
> 
> Best regards,
> 
> --Michael
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at projects.linpro.no
> http://projects.linpro.no/mailman/listinfo/varnish-misc