Poll() on shared pipe() high context switching issue affect varnish? :: was Feedback on session_linger (high load/n_wrk_overflow issues)
Damon Snyder
damon at huddler-inc.com
Fri Feb 8 03:09:23 CET 2013
I think the issue we have been seeing *may* be related to the polling on a
shared pipe() issue that was first mentioned here:
http://lkml.indiana.edu/hypermail/linux/kernel/1012.1/03515.html.
I think I narrowed the cause of the high csw/s down to multiple esi
includes (2 most of the time) with s-maxage=0 on a 1M+ pageview per month
site (~60conn/s according to the first line of varnishstat). Most of the
page views have these esi includes that aren't being cached (not all of the
60/s). If we set s-maxage to something > 0 (say 20s) in the headers for the
esi module coming from the backend to varnish the context switching issue
goes away-- the csw/s degrades linearly back to the steady state at about
10k/s.
I created a threaded version of the example code from the lkml email (
https://gist.github.com/drsnyder/4735889). When I compile and run this code
on a 2.6.32-279 or below kernel the csw/s goes to ~200k/s which defies any
attempt I have made to reason about why that would be the case. If I run
the same code on a 3.0+ Linux kernel, I will see about 12k csw/s. Still
high, but much more reasonable.
We see the same csw/s behavior when we set the s-maxage=0 for the esi
modules on the production site with about the same number of threads (400).
So does this polling on a shared pipe() issue affect varnish 2.1.5? It
would appear so, but I'm looking for confirmation. I saw a commit
referencing context switching here (
https://www.varnish-cache.org/trac/changeset/05f816345a329ef52644a0239892f8be6314a3fa)
but I also can't imagine that this would trigger 200k/s in our environment.
If it is an issue, is there anything we can do to mitigate it in the
varnish configuration? Are there versions where this has been fixed?
We have tested sending production traffic with s-maxage=0 to an Ubuntu
system running 3.2.0-27 and saw a spike in context switching but it peaked
at about 20k/s with no observed impact on stability or delivery. So,
switching from Centos to Ubuntu is a possible option.
Thanks,
Damon
---------- Forwarded message ----------
From: Damon Snyder <damon at huddler-inc.com>
Date: Thu, Jan 31, 2013 at 11:10 PM
Subject: Re: Feedback on session_linger (high load/n_wrk_overflow issues)
To: "varnish-misc at varnish-cache.org" <varnish-misc at varnish-cache.org>
Well, I spoke too soon. Bumping session_linger brought us stability for
about a day and then the high context switching returned. See the attached
plot to illustrate the problem. The context switching (pulled from dstat
and added to varnish) starts to ramp up. As soon it crosses the ~50k mark
we start start seeing stability and latency issues (10k/s is more
"normal"). So now I'm at a loss as to how to proceed.
Below is the current varnishstat -1 output when the system is well behaved.
I wasn't able to capture it when the context switching peaked.
n_wrk_overflow spiked to about 1500 at the time and the load average over
the past 5 min was ~15.
This is running on a dual hex core Intel Xeon E5-2620 (24 contexts or
cores) with 64GB of memory. The hitrate is about 0.65 and we are nuking
(n_lru_nuked incrementing) once the system has been running for a few
hours. We have long tail objects in that we have a lot of content, but only
some of it is hot so it's difficult to predict our precise cache size
requirements at any given time. We are using varnish-2.1.5 on Centos 5.6
with kernel 2.6.18-238.
I haven't found anything in syslog that would be of interest.
Thanks,
Damon
# currently running command line
/usr/local/varnish-2.1.5/sbin/varnishd -P /var/run/varnish.pid -a
10.16.50.150:6081,127.0.0.1:6081 -f /etc/varnish/default.vcl -T
127.0.0.1:6082 -t 120 -w 100,2000,120 -u varnish -g varnish -s malloc,48G
-p thread_pools 8 -p thread_pool_add_delay 2 -p listen_depth 1024 -p
session_linger 110 -p lru_interval 60 -p sess_workspace 524288
# varnishstat -1
client_conn 5021009 442.11 Client connections accepted
client_drop 0 0.00 Connection dropped, no sess/wrk
client_req 5020366 442.05 Client requests received
cache_hit 4597754 404.84 Cache hits
cache_hitpass 24775 2.18 Cache hits for pass
cache_miss 2664516 234.61 Cache misses
backend_conn 3101882 273.13 Backend conn. success
backend_unhealthy 0 0.00 Backend conn. not attempted
backend_busy 0 0.00 Backend conn. too many
backend_fail 0 0.00 Backend conn. failures
backend_reuse 99 0.01 Backend conn. reuses
backend_toolate 11777 1.04 Backend conn. was closed
backend_recycle 11877 1.05 Backend conn. recycles
backend_unused 0 0.00 Backend conn. unused
fetch_head 33 0.00 Fetch head
fetch_length 2134117 187.91 Fetch with Length
fetch_chunked 811172 71.42 Fetch chunked
fetch_eof 0 0.00 Fetch EOF
fetch_bad 0 0.00 Fetch had bad headers
fetch_close 156333 13.77 Fetch wanted close
fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed
fetch_zero 0 0.00 Fetch zero len
fetch_failed 1 0.00 Fetch failed
n_sess_mem 3867 . N struct sess_mem
n_sess 3841 . N struct sess
n_object 757929 . N struct object
n_vampireobject 0 . N unresurrected objects
n_objectcore 758158 . N struct objectcore
n_objecthead 760076 . N struct objecthead
n_smf 0 . N struct smf
n_smf_frag 0 . N small free smf
n_smf_large 0 . N large free smf
n_vbe_conn 69 . N struct vbe_conn
n_wrk 800 . N worker threads
n_wrk_create 800 0.07 N worker threads created
n_wrk_failed 0 0.00 N worker threads not created
n_wrk_max 0 0.00 N worker threads limited
n_wrk_queue 0 0.00 N queued work requests
n_wrk_overflow 171 0.02 N overflowed work requests
n_wrk_drop 0 0.00 N dropped work requests
n_backend 21 . N backends
n_expired 1879027 . N expired objects
n_lru_nuked 0 . N LRU nuked objects
n_lru_saved 0 . N LRU saved objects
n_lru_moved 813472 . N LRU moved objects
n_deathrow 0 . N objects on deathrow
losthdr 0 0.00 HTTP header overflows
n_objsendfile 0 0.00 Objects sent with sendfile
n_objwrite 5195402 457.46 Objects sent with write
n_objoverflow 0 0.00 Objects overflowing workspace
s_sess 5020819 442.09 Total Sessions
s_req 5020366 442.05 Total Requests
s_pipe 0 0.00 Total pipe
s_pass 437300 38.50 Total pass
s_fetch 3101668 273.11 Total fetch
s_hdrbytes 2575156242 226746.17 Total header bytes
s_bodybytes 168406835436 14828461.34 Total body bytes
sess_closed 5020817 442.09 Session Closed
sess_pipeline 0 0.00 Session Pipeline
sess_readahead 0 0.00 Session Read Ahead
sess_linger 0 0.00 Session Linger
sess_herd 0 0.00 Session herd
shm_records 430740934 37927.35 SHM records
shm_writes 29264523 2576.78 SHM writes
shm_flushes 51411 4.53 SHM flushes due to overflow
shm_cont 78056 6.87 SHM MTX contention
shm_cycles 186 0.02 SHM cycles through buffer
sm_nreq 0 0.00 allocator requests
sm_nobj 0 . outstanding allocations
sm_balloc 0 . bytes allocated
sm_bfree 0 . bytes free
sma_nreq 5570153 490.46 SMA allocator requests
sma_nobj 1810351 . SMA outstanding allocations
sma_nbytes 35306367656 . SMA outstanding bytes
sma_balloc 154583709612 . SMA bytes allocated
sma_bfree 119277341956 . SMA bytes free
sms_nreq 32127 2.83 SMS allocator requests
sms_nobj 0 . SMS outstanding allocations
sms_nbytes 0 . SMS outstanding bytes
sms_balloc 29394423 . SMS bytes allocated
sms_bfree 29394423 . SMS bytes freed
backend_req 3101977 273.13 Backend requests made
n_vcl 1 0.00 N vcl total
n_vcl_avail 1 0.00 N vcl available
n_vcl_discard 0 0.00 N vcl discarded
n_purge 31972 . N total active purges
n_purge_add 31972 2.82 N new purges added
n_purge_retire 0 0.00 N old purges deleted
n_purge_obj_test 3317282 292.09 N objects tested
n_purge_re_test 530623465 46722.15 N regexps tested against
n_purge_dups 28047 2.47 N duplicate purges removed
hcb_nolock 7285152 641.47 HCB Lookups without lock
hcb_lock 2543066 223.92 HCB Lookups with lock
hcb_insert 2542964 223.91 HCB Inserts
esi_parse 741690 65.31 Objects ESI parsed (unlock)
esi_errors 0 0.00 ESI parse errors (unlock)
accept_fail 0 0.00 Accept failures
client_drop_late 0 0.00 Connection dropped late
uptime 11357 1.00 Client uptime
backend_retry 99 0.01 Backend conn. retry
dir_dns_lookups 0 0.00 DNS director lookups
dir_dns_failed 0 0.00 DNS director failed lookups
dir_dns_hit 0 0.00 DNS director cached lookups hit
dir_dns_cache_full 0 0.00 DNS director full dnscache
fetch_1xx 0 0.00 Fetch no body (1xx)
fetch_204 0 0.00 Fetch no body (204)
fetch_304 14 0.00 Fetch no body (304)
On Wed, Jan 30, 2013 at 3:45 PM, Damon Snyder <damon at huddler-inc.com> wrote:
> When you 'varnishadm -T localhost:port param.show session_linger' it
> indicates at the bottom that "we don't know if this is a good idea... and
> feeback is welcome."
>
> We found that setting session_linger pulled us out of a bind. I wanted to
> add my feedback to the list in the hope that someone else might benefit
> from what we experienced.
>
> We recently increased the number of esi includes on pages that get ~60-70
> req/s on our platform. Some of those modules were being rendered with
> s-maxage set to zero so that they would be refreshed on every page load
> (this is so we could insert a non-cached partial into the page) which
> further increased the request load on varnish.
>
> What we found is that after a few hours the load on a varnish box went
> from < 1 to > 10 or more and n_wkr_overflow started incrementing. After
> investigating further we noticed that the context switching went from
> ~10k/s to > 100k/s. We are running Linux specifically Centos.
>
> No adjusting of threads or thread pools had any impact on the thrashing.
> After reading Kristian's post<http://kly.no/posts/2009_10_19__High_end_Varnish_tuning__.html> about
> high-end varnish tuning we decided to try out session_linger. We started by
> doubling the default from 50 to 100 to test the theory ('varnishadm -T
> localhost:port param.set session_linger 100'). Once we did that we saw a
> gradual settling of the context switching (using dstat or sar -w) and
> a stabilizing of the load.
>
> It's such a great feature to be able to change this parameter via the
> admin interface. We have 50GB malloc'ed and some nuking on our boxes so
> restarting varnish doesn't come without some impact to the platform.
>
> Intuitively increasing session_linger makes sense. If you have several esi
> modules rendered within a page and the gap between them is > 50ms then
> they'll be reallocated elsewhere.
>
> What is not clear to me is how we should tune session_linger. We started
> by setting it to the 3rd quantile of render times for the esi module taken
> from a sampling of backend requests. This turned out to be 110ms.
>
> Damon
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20130207/f002c79a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: varnish-context-switching.png
Type: image/png
Size: 55752 bytes
Desc: not available
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20130207/f002c79a/attachment-0001.png>
More information about the varnish-misc
mailing list