varnishd runtime parameters
Kristian Lyngstol
kristian at redpill-linpro.com
Wed Mar 25 20:03:20 CET 2009
On Wed, Mar 25, 2009 at 10:37:52AM -0700, Tung Nguyen wrote:
> Im wondering how you are testing, Im using ab, apachebench, to see how
> things behave with -c 10 -n 1000, on the varnished pages.
I've got two ways of testing. One is with siege, typically run with siege
-c200 -b. Siege, however, is a little unreliable so I reserve that for test
that I actively monitor.
For stress testing trunk we've got a small script suite to run httperf
for a few hours against Varnish under both FreeBSD and Linux. This test is
split in two: All cold hits (fill the cache, start fetching objects that
are swapped out) and all hot hits (constantly hit the same tiny pages).
During the development of the hot hit tests it became apparent that the
default cli_timeout of 5 seconds was very insufficient. At 10 seconds it
would _mostly_ work, but even at 15 seconds the clients would occasionally
not respond in time, and thus get killed. The rig has been running with a
cli_timeout of 23 for the past few months without issue.
It should be noted that this is an extreme case that is very unlikely to
happen in a production environment due to the nature of a network.
All in all, however, I much prefer siege for short-lived and specific
tests.
It's also important to remember that stress testing varnish like this is
unlikely to reveal the most likely problems faced in Varnish deployment.
The most common problem is not the raw performance, but the interaction
between varnish and the backends with regards to what's cacheable and not.
And if the server has enough memory / disk. (Using disk is extremely slow,
but might make sense in some situations).
> Here's more specific questions more run time parameters. The general
> question I have is what to look for during testing, should I be looking at
> your varnishstat and are the most important things to look for in that
> output.
Hit rate, hit rate and hit rate. Then look at overflowed requests, and make
sure dropped requests are always 0. Making sure backend_failures are under
control is also important.
But this has to be adjusted to what sort of traffic you're serving and what
sort of request rate you have.
> Our varnish stack will look like this:
>
> LB -> Varnish x 2 -> Nginx x 6 -> Mongrel x 60
>
> Some questions about how best to decide how to configure them best to
> configure the run time parameters.
>
> -p obj_workspace=4096
> Cant find obj_workspace in the man page but found it in the twitter email
> post
> http://projects.linpro.no/pipermail/varnish-dev/2009-February/000968.html
Should be in the man page too, but it's documented if you telnet to the
management interface:
param.show obj_workspace
200 573
obj_workspace 8192 [bytes]
Default is 8192
Bytes of HTTP protocol workspace allocated for
objects. This space must be big enough for the
entire HTTP protocol header and any edits done to
it in the VCL code while it is cached.
Minimum is 1024 bytes.
> Is obj_workspace how much space preallocated to be used for the obj that
> gets returned from the backend? So, if my nginx backend returns a web page
> that is over 4MB than -p obj_workspace is not enough, would that crash
> varnish, or log the error somewhere.
No, it's just for the headers from your backend + whatever you do to it in
VCL. The actual object can be as large as needed.
> -p sess_workspace=262144
> Same deal here with the man page and twitter post.
> What is the sess_workspace?
Same as obj_workspace but for headers from a client if memory serves me
right.
> http_workspace
> How does sess_workspace and obj_workspace relate to http_workspace?
> If we use obj_workspace=4096 and sess_workspace=262144, does the default
> http_workspace=8192 make sense?
http_workspace doesn't exist in Varnish 2.x
> -p lru_interval=60
> Shows up on the twitter post again, but no man notes yet. Whats the default
> for this?
... This man page really needs to be updated. The documentation is
available in the management interface.
param.show lru_interval
200 783
lru_interval 2 [seconds]
Default is 2
Grace period before object moves on LRU list.
Objects are only moved to the front of the LRU
list if they have not been moved there already
inside this timeout period. This reduces the
amount of lock operations necessary for LRU list
access.
> -p sess_timeout=10 \
> Default for this is 5. If the requests from the backend takes longer than 5
> seconds, what happens? Sometimes we have really slow response from the
> backend..
This isn't for backends. It's for client requests. So if a client doesn't
do anything in 5 (or 10 seconds), it's disconnected.
> -p shm_workspace=32768 \
> Is this the same as setting the command line flag -l shmlogsize. The
> default is 80MB. So dont know twitter did both setting it to less..
Eh, never had a problem with this myself, but I'm not twitter.
> -p thread_pools=4 \
> -p thread_pool_min=100 \
> thread_pool_max
> The defaults are 1,1,1000 respectively. Im wondering how best to determine
> this or just leave as default.
You're definitely not using Varnish 2. Upgrade.
Increasing the number of thread pools might be necessary if you have
extreme traffic. Threads should be set to how many concurrent connections
you expect normally. Just keep in mind that thread_pool_min * thread_pools
is the amount of threads started. Starting a thousand threads isn't a bad
idea.
(Considered commercial support? This is getting pretty extensive.)
--
Kristian Lyngstøl
Redpill Linpro AS
Tlf: +47 21544179
Mob: +47 99014497
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20090325/b7776260/attachment-0003.pgp>
More information about the varnish-misc
mailing list