Feedback on session_linger (high load/n_wrk_overflow issues)
damon at huddler-inc.com
Thu Jan 31 00:45:51 CET 2013
When you 'varnishadm -T localhost:port param.show session_linger' it
indicates at the bottom that "we don't know if this is a good idea... and
feeback is welcome."
We found that setting session_linger pulled us out of a bind. I wanted to
add my feedback to the list in the hope that someone else might benefit
from what we experienced.
We recently increased the number of esi includes on pages that get ~60-70
req/s on our platform. Some of those modules were being rendered with
s-maxage set to zero so that they would be refreshed on every page load
(this is so we could insert a non-cached partial into the page) which
further increased the request load on varnish.
What we found is that after a few hours the load on a varnish box went from
< 1 to > 10 or more and n_wkr_overflow started incrementing. After
investigating further we noticed that the context switching went from
~10k/s to > 100k/s. We are running Linux specifically Centos.
No adjusting of threads or thread pools had any impact on the thrashing.
After reading Kristian's
high-end varnish tuning we decided to try out session_linger. We started by
doubling the default from 50 to 100 to test the theory ('varnishadm -T
localhost:port param.set session_linger 100'). Once we did that we saw a
gradual settling of the context switching (using dstat or sar -w) and
a stabilizing of the load.
It's such a great feature to be able to change this parameter via the admin
interface. We have 50GB malloc'ed and some nuking on our boxes so
restarting varnish doesn't come without some impact to the platform.
Intuitively increasing session_linger makes sense. If you have several esi
modules rendered within a page and the gap between them is > 50ms then
they'll be reallocated elsewhere.
What is not clear to me is how we should tune session_linger. We started by
setting it to the 3rd quantile of render times for the esi module taken
from a sampling of backend requests. This turned out to be 110ms.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the varnish-misc