Poll() on shared pipe() high context switching issue affect varnish? :: was Feedback on session_linger (high load/n_wrk_overflow issues)
damon at huddler-inc.com
Thu Feb 14 19:37:14 CET 2013
My apologies for spamming the mailing list. I hope I save someone some time
by providing this monolog.
I'm not sure exactly what is causing the high load and context switching.
Adding s-maxage=0 to the Cache-Control header on multiple esi tags in 100s
or 1000s of high traffic pages that are cached appears to introduce it.
The apparent fix for this issue is to (pass) in vcl_fetch when
Cache-Control contains s-maxage=0. Alternatively, you can match the
specific URLs of the esi includes that you need to bypass. We have been
doing the latter for a few days in production without issue. I'm testing
the former now.
Adding this eliminated the persistently high 100k context switches per
second and reduced the csw/s down to a steady state of less than 4k csw/s.
The load is also a lot more stable. n_wrk_overflow is also now stable and
very low. Previously, when the csw/s would exceed ~100k we would start
seeing n_wrk_overflow increment and timeouts.
As part of this process, I created a simple varnish testbed to experiment
with different varnish and VCL configurations to try and drive up the csw
in a controlled environment and test the changes that might bring it back
down. It's not fun (or profitable) trying to experiment with the varnish
configuration in production on a high traffic site.
The testbed is here https://github.com/drsnyder/varnish-testbed. I used
sinatra + unicorn for the backend and siege to test the traffic. For the
test, I generate a medium 10k+ page and insert three esi tags that generate
text. I cache the page but not the esi includes. I also introduce a random
sleep to try and simulate a "real" backend request. Other backends or
experiments should be easy to generate. YMMV.
On Thu, Feb 7, 2013 at 6:09 PM, Damon Snyder <damon at huddler-inc.com> wrote:
> I think the issue we have been seeing *may* be related to the polling on a
> shared pipe() issue that was first mentioned here:
> I think I narrowed the cause of the high csw/s down to multiple esi
> includes (2 most of the time) with s-maxage=0 on a 1M+ pageview per month
> site (~60conn/s according to the first line of varnishstat). Most of the
> page views have these esi includes that aren't being cached (not all of the
> 60/s). If we set s-maxage to something > 0 (say 20s) in the headers for the
> esi module coming from the backend to varnish the context switching issue
> goes away-- the csw/s degrades linearly back to the steady state at about
> I created a threaded version of the example code from the lkml email (
> https://gist.github.com/drsnyder/4735889). When I compile and run this
> code on a 2.6.32-279 or below kernel the csw/s goes to ~200k/s which defies
> any attempt I have made to reason about why that would be the case. If I
> run the same code on a 3.0+ Linux kernel, I will see about 12k csw/s. Still
> high, but much more reasonable.
> We see the same csw/s behavior when we set the s-maxage=0 for the esi
> modules on the production site with about the same number of threads (400).
> So does this polling on a shared pipe() issue affect varnish 2.1.5? It
> would appear so, but I'm looking for confirmation. I saw a commit
> referencing context switching here (
> but I also can't imagine that this would trigger 200k/s in our environment.
> If it is an issue, is there anything we can do to mitigate it in the
> varnish configuration? Are there versions where this has been fixed?
> We have tested sending production traffic with s-maxage=0 to an Ubuntu
> system running 3.2.0-27 and saw a spike in context switching but it peaked
> at about 20k/s with no observed impact on stability or delivery. So,
> switching from Centos to Ubuntu is a possible option.
> ---------- Forwarded message ----------
> From: Damon Snyder <damon at huddler-inc.com>
> Date: Thu, Jan 31, 2013 at 11:10 PM
> Subject: Re: Feedback on session_linger (high load/n_wrk_overflow issues)
> To: "varnish-misc at varnish-cache.org" <varnish-misc at varnish-cache.org>
> Well, I spoke too soon. Bumping session_linger brought us stability for
> about a day and then the high context switching returned. See the attached
> plot to illustrate the problem. The context switching (pulled from dstat
> and added to varnish) starts to ramp up. As soon it crosses the ~50k mark
> we start start seeing stability and latency issues (10k/s is more
> "normal"). So now I'm at a loss as to how to proceed.
> Below is the current varnishstat -1 output when the system is well
> behaved. I wasn't able to capture it when the context switching peaked.
> n_wrk_overflow spiked to about 1500 at the time and the load average over
> the past 5 min was ~15.
> This is running on a dual hex core Intel Xeon E5-2620 (24 contexts or
> cores) with 64GB of memory. The hitrate is about 0.65 and we are nuking
> (n_lru_nuked incrementing) once the system has been running for a few
> hours. We have long tail objects in that we have a lot of content, but only
> some of it is hot so it's difficult to predict our precise cache size
> requirements at any given time. We are using varnish-2.1.5 on Centos 5.6
> with kernel 2.6.18-238.
> I haven't found anything in syslog that would be of interest.
> # currently running command line
> /usr/local/varnish-2.1.5/sbin/varnishd -P /var/run/varnish.pid -a
> 10.16.50.150:6081,127.0.0.1:6081 -f /etc/varnish/default.vcl -T
> 127.0.0.1:6082 -t 120 -w 100,2000,120 -u varnish -g varnish -s malloc,48G
> -p thread_pools 8 -p thread_pool_add_delay 2 -p listen_depth 1024 -p
> session_linger 110 -p lru_interval 60 -p sess_workspace 524288
> # varnishstat -1
> client_conn 5021009 442.11 Client connections accepted
> client_drop 0 0.00 Connection dropped, no sess/wrk
> client_req 5020366 442.05 Client requests received
> cache_hit 4597754 404.84 Cache hits
> cache_hitpass 24775 2.18 Cache hits for pass
> cache_miss 2664516 234.61 Cache misses
> backend_conn 3101882 273.13 Backend conn. success
> backend_unhealthy 0 0.00 Backend conn. not attempted
> backend_busy 0 0.00 Backend conn. too many
> backend_fail 0 0.00 Backend conn. failures
> backend_reuse 99 0.01 Backend conn. reuses
> backend_toolate 11777 1.04 Backend conn. was closed
> backend_recycle 11877 1.05 Backend conn. recycles
> backend_unused 0 0.00 Backend conn. unused
> fetch_head 33 0.00 Fetch head
> fetch_length 2134117 187.91 Fetch with Length
> fetch_chunked 811172 71.42 Fetch chunked
> fetch_eof 0 0.00 Fetch EOF
> fetch_bad 0 0.00 Fetch had bad headers
> fetch_close 156333 13.77 Fetch wanted close
> fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed
> fetch_zero 0 0.00 Fetch zero len
> fetch_failed 1 0.00 Fetch failed
> n_sess_mem 3867 . N struct sess_mem
> n_sess 3841 . N struct sess
> n_object 757929 . N struct object
> n_vampireobject 0 . N unresurrected objects
> n_objectcore 758158 . N struct objectcore
> n_objecthead 760076 . N struct objecthead
> n_smf 0 . N struct smf
> n_smf_frag 0 . N small free smf
> n_smf_large 0 . N large free smf
> n_vbe_conn 69 . N struct vbe_conn
> n_wrk 800 . N worker threads
> n_wrk_create 800 0.07 N worker threads created
> n_wrk_failed 0 0.00 N worker threads not created
> n_wrk_max 0 0.00 N worker threads limited
> n_wrk_queue 0 0.00 N queued work requests
> n_wrk_overflow 171 0.02 N overflowed work requests
> n_wrk_drop 0 0.00 N dropped work requests
> n_backend 21 . N backends
> n_expired 1879027 . N expired objects
> n_lru_nuked 0 . N LRU nuked objects
> n_lru_saved 0 . N LRU saved objects
> n_lru_moved 813472 . N LRU moved objects
> n_deathrow 0 . N objects on deathrow
> losthdr 0 0.00 HTTP header overflows
> n_objsendfile 0 0.00 Objects sent with sendfile
> n_objwrite 5195402 457.46 Objects sent with write
> n_objoverflow 0 0.00 Objects overflowing workspace
> s_sess 5020819 442.09 Total Sessions
> s_req 5020366 442.05 Total Requests
> s_pipe 0 0.00 Total pipe
> s_pass 437300 38.50 Total pass
> s_fetch 3101668 273.11 Total fetch
> s_hdrbytes 2575156242 226746.17 Total header bytes
> s_bodybytes 168406835436 14828461.34 Total body bytes
> sess_closed 5020817 442.09 Session Closed
> sess_pipeline 0 0.00 Session Pipeline
> sess_readahead 0 0.00 Session Read Ahead
> sess_linger 0 0.00 Session Linger
> sess_herd 0 0.00 Session herd
> shm_records 430740934 37927.35 SHM records
> shm_writes 29264523 2576.78 SHM writes
> shm_flushes 51411 4.53 SHM flushes due to overflow
> shm_cont 78056 6.87 SHM MTX contention
> shm_cycles 186 0.02 SHM cycles through buffer
> sm_nreq 0 0.00 allocator requests
> sm_nobj 0 . outstanding allocations
> sm_balloc 0 . bytes allocated
> sm_bfree 0 . bytes free
> sma_nreq 5570153 490.46 SMA allocator requests
> sma_nobj 1810351 . SMA outstanding allocations
> sma_nbytes 35306367656 . SMA outstanding bytes
> sma_balloc 154583709612 . SMA bytes allocated
> sma_bfree 119277341956 . SMA bytes free
> sms_nreq 32127 2.83 SMS allocator requests
> sms_nobj 0 . SMS outstanding allocations
> sms_nbytes 0 . SMS outstanding bytes
> sms_balloc 29394423 . SMS bytes allocated
> sms_bfree 29394423 . SMS bytes freed
> backend_req 3101977 273.13 Backend requests made
> n_vcl 1 0.00 N vcl total
> n_vcl_avail 1 0.00 N vcl available
> n_vcl_discard 0 0.00 N vcl discarded
> n_purge 31972 . N total active purges
> n_purge_add 31972 2.82 N new purges added
> n_purge_retire 0 0.00 N old purges deleted
> n_purge_obj_test 3317282 292.09 N objects tested
> n_purge_re_test 530623465 46722.15 N regexps tested against
> n_purge_dups 28047 2.47 N duplicate purges removed
> hcb_nolock 7285152 641.47 HCB Lookups without lock
> hcb_lock 2543066 223.92 HCB Lookups with lock
> hcb_insert 2542964 223.91 HCB Inserts
> esi_parse 741690 65.31 Objects ESI parsed (unlock)
> esi_errors 0 0.00 ESI parse errors (unlock)
> accept_fail 0 0.00 Accept failures
> client_drop_late 0 0.00 Connection dropped late
> uptime 11357 1.00 Client uptime
> backend_retry 99 0.01 Backend conn. retry
> dir_dns_lookups 0 0.00 DNS director lookups
> dir_dns_failed 0 0.00 DNS director failed lookups
> dir_dns_hit 0 0.00 DNS director cached lookups hit
> dir_dns_cache_full 0 0.00 DNS director full dnscache
> fetch_1xx 0 0.00 Fetch no body (1xx)
> fetch_204 0 0.00 Fetch no body (204)
> fetch_304 14 0.00 Fetch no body (304)
> On Wed, Jan 30, 2013 at 3:45 PM, Damon Snyder <damon at huddler-inc.com>wrote:
>> When you 'varnishadm -T localhost:port param.show session_linger' it
>> indicates at the bottom that "we don't know if this is a good idea... and
>> feeback is welcome."
>> We found that setting session_linger pulled us out of a bind. I wanted to
>> add my feedback to the list in the hope that someone else might benefit
>> from what we experienced.
>> We recently increased the number of esi includes on pages that get ~60-70
>> req/s on our platform. Some of those modules were being rendered with
>> s-maxage set to zero so that they would be refreshed on every page load
>> (this is so we could insert a non-cached partial into the page) which
>> further increased the request load on varnish.
>> What we found is that after a few hours the load on a varnish box went
>> from < 1 to > 10 or more and n_wkr_overflow started incrementing. After
>> investigating further we noticed that the context switching went from
>> ~10k/s to > 100k/s. We are running Linux specifically Centos.
>> No adjusting of threads or thread pools had any impact on the thrashing.
>> After reading Kristian's post<http://kly.no/posts/2009_10_19__High_end_Varnish_tuning__.html> about
>> high-end varnish tuning we decided to try out session_linger. We started by
>> doubling the default from 50 to 100 to test the theory ('varnishadm -T
>> localhost:port param.set session_linger 100'). Once we did that we saw a
>> gradual settling of the context switching (using dstat or sar -w) and
>> a stabilizing of the load.
>> It's such a great feature to be able to change this parameter via the
>> admin interface. We have 50GB malloc'ed and some nuking on our boxes so
>> restarting varnish doesn't come without some impact to the platform.
>> Intuitively increasing session_linger makes sense. If you have several
>> esi modules rendered within a page and the gap between them is > 50ms then
>> they'll be reallocated elsewhere.
>> What is not clear to me is how we should tune session_linger. We started
>> by setting it to the 3rd quantile of render times for the esi module taken
>> from a sampling of backend requests. This turned out to be 110ms.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the varnish-misc