Varnish 2.0.6 nuking all my objects?
David Birdsong
david.birdsong at gmail.com
Thu Feb 25 09:26:12 CET 2010
I have seen this happen.
I have a similar hardware setup, though I changed the multi-ssd raid
into 3 separate cache file arguments.
We had roughly 240GB storage space total, after about 2-3 weeks and
sm_bfree reached ~20GB. lru_nuked started incrementing, sm_bfree
climbed to ~60GB, but lru_nuking never stopped.
On Wed, Feb 24, 2010 at 8:15 PM, Barry Abrahamson <barry at automattic.com> wrote:
> Howdy,
>
> We are finally getting around to upgrading to the latest version of varnish and are running into quite a weird problem. Everything works fine for a bit (1+day) , then all of a sudden Varnish starts nuking all of the objects from the cache:
>
> About 4 hours ago there were 1 million objects in the cache, now there are just about 172k. This looks a bit weird to me:
>
> sms_nbytes 18446744073709548694 . SMS outstanding bytes
>
> Here are the options I am passing to varnishd:
>
> /usr/local/sbin/varnishd -a 0.0.0.0:8888 -f /etc/varnish/varnish.vcl -P /var/run/varnishd.pid -T 0.0.0.0:47200 -t 600 -w 1,200,300 -p thread_pools 4 -p thread_pool_add_delay 2 -p lru_interval 60 -h classic,500009 -p obj_workspace 4096 -s file,/varnish/cache,150G
>
> /varnish is 2 x 80GB Intel X-25M SSDs in a software RAID 0 array. OS is Debian Lenny 64-bit. There is plenty of space:
>
> /dev/md0 149G 52G 98G 35% /varnish
>
> Here is the output of varnishstat -1
>
> uptime 134971 . Child uptime
> client_conn 12051037 89.29 Client connections accepted
> client_drop 0 0.00 Connection dropped, no sess
> client_req 12048672 89.27 Client requests received
> cache_hit 10161272 75.28 Cache hits
> cache_hitpass 133244 0.99 Cache hits for pass
> cache_miss 1750857 12.97 Cache misses
> backend_conn 1824594 13.52 Backend conn. success
> backend_unhealthy 0 0.00 Backend conn. not attempted
> backend_busy 0 0.00 Backend conn. too many
> backend_fail 3644 0.03 Backend conn. failures
> backend_reuse 0 0.00 Backend conn. reuses
> backend_toolate 0 0.00 Backend conn. was closed
> backend_recycle 0 0.00 Backend conn. recycles
> backend_unused 0 0.00 Backend conn. unused
> fetch_head 5309 0.04 Fetch head
> fetch_length 1816422 13.46 Fetch with Length
> fetch_chunked 0 0.00 Fetch chunked
> fetch_eof 0 0.00 Fetch EOF
> fetch_bad 0 0.00 Fetch had bad headers
> fetch_close 0 0.00 Fetch wanted close
> fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed
> fetch_zero 0 0.00 Fetch zero len
> fetch_failed 16 0.00 Fetch failed
> n_srcaddr 0 . N struct srcaddr
> n_srcaddr_act 0 . N active struct srcaddr
> n_sess_mem 578 . N struct sess_mem
> n_sess 414 . N struct sess
> n_object 172697 . N struct object
> n_objecthead 173170 . N struct objecthead
> n_smf 471310 . N struct smf
> n_smf_frag 62172 . N small free smf
> n_smf_large 67978 . N large free smf
> n_vbe_conn 18446744073709551611 . N struct vbe_conn
> n_bereq 315 . N struct bereq
> n_wrk 76 . N worker threads
> n_wrk_create 3039 0.02 N worker threads created
> n_wrk_failed 0 0.00 N worker threads not created
> n_wrk_max 0 0.00 N worker threads limited
> n_wrk_queue 0 0.00 N queued work requests
> n_wrk_overflow 25136 0.19 N overflowed work requests
> n_wrk_drop 0 0.00 N dropped work requests
> n_backend 4 . N backends
> n_expired 771687 . N expired objects
> n_lru_nuked 744693 . N LRU nuked objects
> n_lru_saved 0 . N LRU saved objects
> n_lru_moved 8675178 . N LRU moved objects
> n_deathrow 0 . N objects on deathrow
> losthdr 25 0.00 HTTP header overflows
> n_objsendfile 0 0.00 Objects sent with sendfile
> n_objwrite 11749415 87.05 Objects sent with write
> n_objoverflow 0 0.00 Objects overflowing workspace
> s_sess 12051007 89.29 Total Sessions
> s_req 12050184 89.28 Total Requests
> s_pipe 2661 0.02 Total pipe
> s_pass 134858 1.00 Total pass
> s_fetch 1821721 13.50 Total fetch
> s_hdrbytes 3932274894 29134.22 Total header bytes
> s_bodybytes 894452020319 6626994.10 Total body bytes
> sess_closed 12050925 89.29 Session Closed
> sess_pipeline 0 0.00 Session Pipeline
> sess_readahead 0 0.00 Session Read Ahead
> sess_linger 0 0.00 Session Linger
> sess_herd 160 0.00 Session herd
> shm_records 610011852 4519.58 SHM records
> shm_writes 50380066 373.27 SHM writes
> shm_flushes 9387 0.07 SHM flushes due to overflow
> shm_cont 47763 0.35 SHM MTX contention
> shm_cycles 226 0.00 SHM cycles through buffer
> sm_nreq 4449213 32.96 allocator requests
> sm_nobj 341160 . outstanding allocations
> sm_balloc 11373072384 . bytes allocated
> sm_bfree 116602589184 . bytes free
> sma_nreq 0 0.00 SMA allocator requests
> sma_nobj 0 . SMA outstanding allocations
> sma_nbytes 0 . SMA outstanding bytes
> sma_balloc 0 . SMA bytes allocated
> sma_bfree 0 . SMA bytes free
> sms_nreq 63997 0.47 SMS allocator requests
> sms_nobj 0 . SMS outstanding allocations
> sms_nbytes 18446744073709548694 . SMS outstanding bytes
> sms_balloc 31161028 . SMS bytes allocated
> sms_bfree 31162489 . SMS bytes freed
> backend_req 1821961 13.50 Backend requests made
> n_vcl 1 0.00 N vcl total
> n_vcl_avail 1 0.00 N vcl available
> n_vcl_discard 0 0.00 N vcl discarded
> n_purge 1 . N total active purges
> n_purge_add 1 0.00 N new purges added
> n_purge_retire 0 0.00 N old purges deleted
> n_purge_obj_test 0 0.00 N objects tested
> n_purge_re_test 0 0.00 N regexps tested against
> n_purge_dups 0 0.00 N duplicate purges removed
> hcb_nolock 0 0.00 HCB Lookups without lock
> hcb_lock 0 0.00 HCB Lookups with lock
> hcb_insert 0 0.00 HCB Inserts
> esi_parse 0 0.00 Objects ESI parsed (unlock)
> esi_errors 0 0.00 ESI parse errors (unlock)
>
> Obviously, this isn't good for my cache hit rate :) It is also using a lot of CPU. Has anyone seen this happen before?
>
> --
> Barry Abrahamson | Systems Wrangler | Automattic
> Blog: http://barry.wordpress.com
>
>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at projects.linpro.no
> http://projects.linpro.no/mailman/listinfo/varnish-misc
>
More information about the varnish-misc
mailing list