lru nuke causes varnish to stop respnding to client requests
Damon Snyder
damon at huddler-inc.com
Fri Sep 30 18:55:45 CEST 2011
Hi Matt,
Oh, sorry, I didn't notice the worker thread count at 3000. I would suggest
switching to -smalloc,38G (total mem * 0.8). If you have a lot of small
objects or objects of the same size, you could be encountering some
excessive nuking.
Also, what are your backends doing when this happens? Is the nuking a
coincidence or is there an issue further down the stack?
Damon
On Thu, Sep 29, 2011 at 5:42 PM, Matt Schurenko <MSchurenko at airg.com> wrote:
> Sorry. I forgot to mention that I already adjusted thread behaviour via
> varniahadm while varnish was running. I had it set to use min 50, max 3000
> and a thread timeout of 120s. I think the reason why n_wrk_overflow and
> n_wrk_drop are so high is due to this problem. Before the lru nuke happens
> the number of worker threads are ~ 100. As soon as it starts to nuke the
> number of threads jumps to the max. I am monitoring some stats with MRTG. I
> seem to remember that on the other varnish server it would being to lru nuke
> long before the cache got full. For this one there is no lru nuke activity
> until it reaches a certain point and then boom. 3000 threads are used up and
> no more new clients can connect. ****
>
> ** **
>
>
> Matt Schurenko
> Systems Administrator
> ****
>
> *air**G**®** **Share Your World*
> Suite 710, 1133 Melville Street****
>
> Vancouver, BC V6E 4E5****
>
> P: +1.604.408.2228 ****
>
> F: +1.866.874.8136****
>
> E: MSchurenko at airg.com****
>
> W: www.airg.com****
>
> *From:* Damon Snyder [mailto:damon at huddler-inc.com]
> *Sent:* September-29-11 5:30 PM
> *To:* Matt Schurenko
> *Cc:* varnish-misc at varnish-cache.org
> *Subject:* Re: lru nuke causes varnish to stop respnding to client
> requests****
>
> ** **
>
> Hi Matt,****
>
> It looks like you really need to bump up the number of worker threads. From
> your stats:****
>
> ** **
>
> n_wrk_queue 2861 0.02 N queued work requests****
>
> n_wrk_overflow 83534 0.52 N overflowed work requests****
>
> n_wrk_drop 10980 0.07 N dropped work requests****
>
> ** **
>
> You have a lot of requests that are on the queue waiting for a worker and
> you have a lot of requests where varnish has given up trying to fullfil with
> a worker. You can bump the number of works up using the -w command line
> option to varnishd. I would suggest something like -w 400,1000,120 to start
> with (the default is -w2,500,300). This says use 400 at a minimum, 1000 as
> the maximum, and set the timeout to 120s. According to the stats
> explanation doc <https://www.varnish-cache.org/trac/wiki/StatsExplained> your
> n_wrk_queue and n_wrk_drop should be 0. If you see these numbers going up
> again, use -w 500,2000,120 or something like that.****
>
> ** **
>
> Hope this helps,****
>
> Damon****
>
> ** **
>
> On Thu, Sep 29, 2011 at 4:34 PM, Matt Schurenko <MSchurenko at airg.com>
> wrote:****
>
> I’ve been having this problem for a couple weeks now on one of our varnish
> servers. I have posted a couple times already. What happens is that the
> server in questions runs fine until the cache gets full. When it starts to
> lru nuke the number of worker threads jumps up to thread_pool_max and
> varnish stops responding to any client requests. I have tried this with
> Centos5.4, 5.7 and now Slackware (all 64-bit ) and the behaviour is the
> same. ****
>
> ****
>
> I am using varnish version 2.1.5 on a Dell C6105 with 48GB of RAM.****
>
> ****
>
> Here is my varnishd command line:****
>
> ****
>
> /usr/local/sbin/varnishd -s file,/tmp/varnish-cache,48g -T 127.0.0.1:2000-a
> 0.0.0.0:80 -t 604800 -f /usr/local/etc/varnish/default.vcl -p http_headers
> 384 -p connect_timeout 4.0****
>
> ****
>
> Here is the output from varnishstat -1:****
>
> ****
>
> client_conn 38582763 240.38 Client connections accepted****
>
> client_drop 10950 0.07 Connection dropped, no sess/wrk
> ****
>
> client_req 38298994 238.61 Client requests received****
>
> cache_hit 32513762 202.57 Cache hits****
>
> cache_hitpass 0 0.00 Cache hits for pass****
>
> cache_miss 5784476 36.04 Cache misses****
>
> backend_conn 5725540 35.67 Backend conn. success****
>
> backend_unhealthy 0 0.00 Backend conn. not attempted***
> *
>
> backend_busy 0 0.00 Backend conn. too many****
>
> backend_fail 1383 0.01 Backend conn. failures****
>
> backend_reuse 60837 0.38 Backend conn. reuses****
>
> backend_toolate 33 0.00 Backend conn. was closed****
>
> backend_recycle 60870 0.38 Backend conn. recycles****
>
> backend_unused 0 0.00 Backend conn. unused****
>
> fetch_head 6 0.00 Fetch head****
>
> fetch_length 93631 0.58 Fetch with Length****
>
> fetch_chunked 5689433 35.45 Fetch chunked****
>
> fetch_eof 0 0.00 Fetch EOF****
>
> fetch_bad 0 0.00 Fetch had bad headers****
>
> fetch_close 107 0.00 Fetch wanted close****
>
> fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed****
>
> fetch_zero 0 0.00 Fetch zero len****
>
> fetch_failed 1 0.00 Fetch failed****
>
> n_sess_mem 7138 . N struct sess_mem****
>
> n_sess 6970 . N struct sess****
>
> n_object 5047123 . N struct object****
>
> n_vampireobject 0 . N unresurrected objects****
>
> n_objectcore 5048435 . N struct objectcore****
>
> n_objecthead 4955641 . N struct objecthead****
>
> n_smf 10139770 . N struct smf****
>
> n_smf_frag 295671 . N small free smf****
>
> n_smf_large 0 . N large free smf****
>
> n_vbe_conn 2997 . N struct vbe_conn****
>
> n_wrk 3000 . N worker threads****
>
> n_wrk_create 5739 0.04 N worker threads created****
>
> n_wrk_failed 0 0.00 N worker threads not created***
> *
>
> n_wrk_max 4063 0.03 N worker threads limited****
>
> n_wrk_queue 2861 0.02 N queued work requests****
>
> n_wrk_overflow 83534 0.52 N overflowed work requests****
>
> n_wrk_drop 10980 0.07 N dropped work requests****
>
> n_backend 2 . N backends****
>
> n_expired 2179 . N expired objects****
>
> n_lru_nuked 862615 . N LRU nuked objects****
>
> n_lru_saved 0 . N LRU saved objects****
>
> n_lru_moved 27156180 . N LRU moved objects****
>
> n_deathrow 0 . N objects on deathrow****
>
> losthdr 0 0.00 HTTP header overflows****
>
> n_objsendfile 0 0.00 Objects sent with sendfile****
>
> n_objwrite 37294888 232.35 Objects sent with write****
>
> n_objoverflow 0 0.00 Objects overflowing workspace**
> **
>
> s_sess 38566049 240.27 Total Sessions****
>
> s_req 38298994 238.61 Total Requests****
>
> s_pipe 0 0.00 Total pipe****
>
> s_pass 266 0.00 Total pass****
>
> s_fetch 5783176 36.03 Total fetch****
>
> s_hdrbytes 12570989864 78319.53 Total header bytes****
>
> s_bodybytes 151327304604 942796.38 Total body bytes****
>
> sess_closed 34673984 216.03 Session Closed****
>
> sess_pipeline 187 0.00 Session Pipeline****
>
> sess_readahead 321 0.00 Session Read Ahead****
>
> sess_linger 3929378 24.48 Session Linger****
>
> sess_herd 3929559 24.48 Session herd****
>
> shm_records 2025645664 12620.14 SHM records****
>
> shm_writes 169640580 1056.89 SHM writes****
>
> shm_flushes 41 0.00 SHM flushes due to overflow****
>
> shm_cont 580515 3.62 SHM MTX contention****
>
> shm_cycles 933 0.01 SHM cycles through buffer****
>
> sm_nreq 12431620 77.45 allocator requests****
>
> sm_nobj 9844099 . outstanding allocations****
>
> sm_balloc 43855261696 . bytes allocated****
>
> sm_bfree 7684345856 . bytes free****
>
> sma_nreq 0 0.00 SMA allocator requests****
>
> sma_nobj 0 . SMA outstanding allocations****
>
> sma_nbytes 0 . SMA outstanding bytes****
>
> sma_balloc 0 . SMA bytes allocated****
>
> sma_bfree 0 . SMA bytes free****
>
> sms_nreq 1566 0.01 SMS allocator requests****
>
> sms_nobj 0 . SMS outstanding allocations****
>
> sms_nbytes 0 . SMS outstanding bytes****
>
> sms_balloc 656154 . SMS bytes allocated****
>
> sms_bfree 656154 . SMS bytes freed****
>
> backend_req 5786381 36.05 Backend requests made****
>
> n_vcl 1 0.00 N vcl total****
>
> n_vcl_avail 1 0.00 N vcl available****
>
> n_vcl_discard 0 0.00 N vcl discarded****
>
> n_purge 218 . N total active purges****
>
> n_purge_add 218 0.00 N new purges added****
>
> n_purge_retire 0 0.00 N old purges deleted****
>
> n_purge_obj_test 588742 3.67 N objects tested****
>
> n_purge_re_test 120444323 750.39 N regexps tested against****
>
> n_purge_dups 0 0.00 N duplicate purges removed****
>
> hcb_nolock 38301670 238.63 HCB Lookups without lock****
>
> hcb_lock 5786309 36.05 HCB Lookups with lock****
>
> hcb_insert 5786305 36.05 HCB Inserts****
>
> esi_parse 0 0.00 Objects ESI parsed (unlock)****
>
> esi_errors 0 0.00 ESI parse errors (unlock)****
>
> accept_fail 0 0.00 Accept failures****
>
> client_drop_late 30 0.00 Connection dropped late****
>
> uptime 160509 1.00 Client uptime****
>
> backend_retry 25 0.00 Backend conn. retry****
>
> dir_dns_lookups 0 0.00 DNS director lookups****
>
> dir_dns_failed 0 0.00 DNS director failed lookups****
>
> dir_dns_hit 0 0.00 DNS director cached lookups hit
> ****
>
> dir_dns_cache_full 0 0.00 DNS director full dnscache***
> *
>
> fetch_1xx 0 0.00 Fetch no body (1xx)****
>
> fetch_204 0 0.00 Fetch no body (204)****
>
> fetch_304 0 0.00 Fetch no body (304)****
>
> ****
>
> Even though I have removed the server from our load balancer there are
> still a lot of requests going to the backend. Maybe these are all queued up
> requests that varnish is trying to fulfill? Here is some output from
> varnishlog –c when I try to connect with curl:****
>
> ****
>
> root at mvp14:~# varnishlog -c****
>
> 26 SessionOpen c 192.168.8.41 41942 0.0.0.0:80****
>
> 26 ReqStart c 192.168.8.41 41942 2108342803****
>
> 26 RxRequest c GET****
>
> 26 RxURL c /****
>
> 26 RxProtocol c HTTP/1.1****
>
> 26 RxHeader c User-Agent: curl/7.21.4 (x86_64-unknown-linux-gnu)
> libcurl/7.21.4 OpenSSL/0.9.8n zlib/1.2.5 libidn/1.19****
>
> 26 RxHeader c Host: mvp14.airg.com****
>
> 26 RxHeader c Accept: */*****
>
> 26 VCL_call c recv****
>
> 26 VCL_return c lookup****
>
> 26 VCL_call c hash****
>
> 26 VCL_return c hash****
>
> ****
>
> ****
>
> The connection just hangs here until it times out.****
>
> ****
>
> Any help would be appreciated. We are trying to replace our squid caching
> layer with varnish. However if I can’t resolve this issue we will have to go
> back to squid.****
>
> ****
>
> Thanks,****
>
> ****
>
> ****
>
> ****
>
> *Matt Schurenko*
> Systems Administrator ****
>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc****
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20110930/41112534/attachment-0003.html>
More information about the varnish-misc
mailing list