Varnish CPU thrashing: Failing to serve requests
Neil Saunders
n.j.saunders at gmail.com
Thu Jul 19 11:45:30 CEST 2012
Varnishstat output:
# varnishstat -1
client_conn 595181 4.57 Client connections accepted
client_drop 0 0.00 Connection dropped, no sess/wrk
client_req 2372854 18.20 Client requests received
cache_hit 1422781 10.91 Cache hits
cache_hitpass 161 0.00 Cache hits for pass
cache_miss 832622 6.39 Cache misses
backend_conn 161225 1.24 Backend conn. success
backend_unhealthy 0 0.00 Backend conn. not attempted
backend_busy 0 0.00 Backend conn. too many
backend_fail 41 0.00 Backend conn. failures
backend_reuse 547909 4.20 Backend conn. reuses
backend_toolate 57162 0.44 Backend conn. was closed
backend_recycle 605090 4.64 Backend conn. recycles
backend_retry 944 0.01 Backend conn. retry
fetch_head 771 0.01 Fetch head
fetch_length 566657 4.35 Fetch with Length
fetch_chunked 132107 1.01 Fetch chunked
fetch_eof 0 0.00 Fetch EOF
fetch_bad 0 0.00 Fetch had bad headers
fetch_close 0 0.00 Fetch wanted close
fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed
fetch_zero 0 0.00 Fetch zero len
fetch_failed 449 0.00 Fetch failed
fetch_1xx 0 0.00 Fetch no body (1xx)
fetch_204 0 0.00 Fetch no body (204)
fetch_304 2802 0.02 Fetch no body (304)
n_sess_mem 351 . N struct sess_mem
n_sess 32 . N struct sess
n_object 73004 . N struct object
n_vampireobject 0 . N unresurrected objects
n_objectcore 73008 . N struct objectcore
n_objecthead 18790 . N struct objecthead
n_waitinglist 6847 . N struct waitinglist
n_vbc 20 . N struct vbc
n_wrk 16 . N worker threads
n_wrk_create 2728 0.02 N worker threads created
n_wrk_failed 0 0.00 N worker threads not created
n_wrk_max 0 0.00 N worker threads limited
n_wrk_lqueue 2 0.00 work request queue length
n_wrk_queued 9127 0.07 N queued work requests
n_wrk_drop 0 0.00 N dropped work requests
n_backend 23 . N backends
n_expired 514339 . N expired objects
n_lru_nuked 0 . N LRU nuked objects
n_lru_moved 593159 . N LRU moved objects
losthdr 0 0.00 HTTP header overflows
n_objsendfile 0 0.00 Objects sent with sendfile
n_objwrite 2257927 17.32 Objects sent with write
n_objoverflow 0 0.00 Objects overflowing workspace
s_sess 595172 4.57 Total Sessions
s_req 2372854 18.20 Total Requests
s_pipe 0 0.00 Total pipe
s_pass 114290 0.88 Total pass
s_fetch 701505 5.38 Total fetch
s_hdrbytes 1088723796 8352.25 Total header bytes
s_bodybytes 23256734944 178416.24 Total body bytes
sess_closed 249051 1.91 Session Closed
sess_pipeline 0 0.00 Session Pipeline
sess_readahead 0 0.00 Session Read Ahead
sess_linger 2123886 16.29 Session Linger
sess_herd 1048297 8.04 Session herd
shm_records 150270766 1152.82 SHM records
shm_writes 7788531 59.75 SHM writes
shm_flushes 0 0.00 SHM flushes due to overflow
shm_cont 3252 0.02 SHM MTX contention
shm_cycles 60 0.00 SHM cycles through buffer
sms_nreq 245202 1.88 SMS allocator requests
sms_nobj 0 . SMS outstanding allocations
sms_nbytes 0 . SMS outstanding bytes
sms_balloc 102585117 . SMS bytes allocated
sms_bfree 102585117 . SMS bytes freed
backend_req 709136 5.44 Backend requests made
n_vcl 1 0.00 N vcl total
n_vcl_avail 1 0.00 N vcl available
n_vcl_discard 0 0.00 N vcl discarded
n_ban 1 . N total active bans
n_ban_add 1 0.00 N new bans added
n_ban_retire 0 0.00 N old bans deleted
n_ban_obj_test 0 0.00 N objects tested
n_ban_re_test 0 0.00 N regexps tested against
n_ban_dups 0 0.00 N duplicate bans removed
hcb_nolock 2255564 17.30 HCB Lookups without lock
hcb_lock 328164 2.52 HCB Lookups with lock
hcb_insert 328160 2.52 HCB Inserts
esi_errors 0 0.00 ESI parse errors (unlock)
esi_warnings 0 0.00 ESI parse warnings (unlock)
accept_fail 0 0.00 Accept failures
client_drop_late 0 0.00 Connection dropped late
uptime 130351 1.00 Client uptime
dir_dns_lookups 0 0.00 DNS director lookups
dir_dns_failed 0 0.00 DNS director failed lookups
dir_dns_hit 0 0.00 DNS director cached lookups hit
dir_dns_cache_full 0 0.00 DNS director full dnscache
vmods 0 . Loaded VMODs
n_gzip 0 0.00 Gzip operations
n_gunzip 681986 5.23 Gunzip operations
LCK.sms.creat 3 0.00 Created locks
LCK.sms.destroy 0 0.00 Destroyed locks
LCK.sms.locks 736368 5.65 Lock Operations
LCK.sms.colls 0 0.00 Collisions
LCK.smp.creat 0 0.00 Created locks
LCK.smp.destroy 0 0.00 Destroyed locks
LCK.smp.locks 0 0.00 Lock Operations
LCK.smp.colls 0 0.00 Collisions
LCK.sma.creat 6 0.00 Created locks
LCK.sma.destroy 0 0.00 Destroyed locks
LCK.sma.locks 3744137 28.72 Lock Operations
LCK.sma.colls 0 0.00 Collisions
LCK.smf.creat 0 0.00 Created locks
LCK.smf.destroy 0 0.00 Destroyed locks
LCK.smf.locks 0 0.00 Lock Operations
LCK.smf.colls 0 0.00 Collisions
LCK.hsl.creat 0 0.00 Created locks
LCK.hsl.destroy 0 0.00 Destroyed locks
LCK.hsl.locks 0 0.00 Lock Operations
LCK.hsl.colls 0 0.00 Collisions
LCK.hcb.creat 3 0.00 Created locks
LCK.hcb.destroy 0 0.00 Destroyed locks
LCK.hcb.locks 653775 5.02 Lock Operations
LCK.hcb.colls 0 0.00 Collisions
LCK.hcl.creat 0 0.00 Created locks
LCK.hcl.destroy 0 0.00 Destroyed locks
LCK.hcl.locks 0 0.00 Lock Operations
LCK.hcl.colls 0 0.00 Collisions
LCK.vcl.creat 3 0.00 Created locks
LCK.vcl.destroy 0 0.00 Destroyed locks
LCK.vcl.locks 21067 0.16 Lock Operations
LCK.vcl.colls 0 0.00 Collisions
LCK.stat.creat 3 0.00 Created locks
LCK.stat.destroy 0 0.00 Destroyed locks
LCK.stat.locks 625 0.00 Lock Operations
LCK.stat.colls 0 0.00 Collisions
LCK.sessmem.creat 3 0.00 Created locks
LCK.sessmem.destroy 0 0.00 Destroyed locks
LCK.sessmem.locks 619699 4.75 Lock Operations
LCK.sessmem.colls 0 0.00 Collisions
LCK.wstat.creat 3 0.00 Created locks
LCK.wstat.destroy 0 0.00 Destroyed locks
LCK.wstat.locks 415560 3.19 Lock Operations
LCK.wstat.colls 0 0.00 Collisions
LCK.herder.creat 3 0.00 Created locks
LCK.herder.destroy 0 0.00 Destroyed locks
LCK.herder.locks 7973 0.06 Lock Operations
LCK.herder.colls 0 0.00 Collisions
LCK.wq.creat 6 0.00 Created locks
LCK.wq.destroy 0 0.00 Destroyed locks
LCK.wq.locks 3001206 23.02 Lock Operations
LCK.wq.colls 0 0.00 Collisions
LCK.objhdr.creat 345509 2.65 Created locks
LCK.objhdr.destroy 312172 2.39 Destroyed locks
LCK.objhdr.locks 10885207 83.51 Lock Operations
LCK.objhdr.colls 0 0.00 Collisions
LCK.exp.creat 3 0.00 Created locks
LCK.exp.destroy 0 0.00 Destroyed locks
LCK.exp.locks 1277776 9.80 Lock Operations
LCK.exp.colls 0 0.00 Collisions
LCK.lru.creat 6 0.00 Created locks
LCK.lru.destroy 0 0.00 Destroyed locks
LCK.lru.locks 642289 4.93 Lock Operations
LCK.lru.colls 0 0.00 Collisions
LCK.cli.creat 3 0.00 Created locks
LCK.cli.destroy 0 0.00 Destroyed locks
LCK.cli.locks 45862 0.35 Lock Operations
LCK.cli.colls 0 0.00 Collisions
LCK.ban.creat 3 0.00 Created locks
LCK.ban.destroy 0 0.00 Destroyed locks
LCK.ban.locks 1293221 9.92 Lock Operations
LCK.ban.colls 0 0.00 Collisions
LCK.vbp.creat 3 0.00 Created locks
LCK.vbp.destroy 0 0.00 Destroyed locks
LCK.vbp.locks 946642 7.26 Lock Operations
LCK.vbp.colls 0 0.00 Collisions
LCK.vbe.creat 3 0.00 Created locks
LCK.vbe.destroy 0 0.00 Destroyed locks
LCK.vbe.locks 340230 2.61 Lock Operations
LCK.vbe.colls 0 0.00 Collisions
LCK.backend.creat 69 0.00 Created locks
LCK.backend.destroy 0 0.00 Destroyed locks
LCK.backend.locks 3139847 24.09 Lock Operations
LCK.backend.colls 0 0.00 Collisions
SMA.s0.c_req 781537 6.00 Allocator requests
SMA.s0.c_fail 0 0.00 Allocator failures
SMA.s0.c_bytes 42478599218 325878.58 Bytes allocated
SMA.s0.c_freed 40948642433 314141.38 Bytes freed
SMA.s0.g_alloc 95619 . Allocations outstanding
SMA.s0.g_bytes 1529956785 . Bytes outstanding
SMA.s0.g_space 5600360015 . Bytes available
SMA.Transient.c_req 774135 5.94 Allocator requests
SMA.Transient.c_fail 0 0.00 Allocator failures
SMA.Transient.c_bytes 44114234836 338426.52 Bytes allocated
SMA.Transient.c_freed 44084118402 338195.48 Bytes freed
SMA.Transient.g_alloc 51006 . Allocations outstanding
SMA.Transient.g_bytes 30116434 . Bytes outstanding
SMA.Transient.g_space 0 . Bytes available
On Thu, Jul 19, 2012 at 10:34 AM, Neil Saunders <n.j.saunders at gmail.com> wrote:
> Hi -
>
> We run 3 Varnish instance in EC2 behind Load Balancer - This setup has
> performed solidly since it's installation.
>
> This morning I came in to find that 2 of the 3 instances thrashing the
> CPU and failing to serve requests. Our monitoring shows that there was
> a notable (20%) CPU steal in availability zones A & B starting at
> 6:30, but I note that this has also occurred in the past and has not
> caused us any issues previously. We've restarted one of the problem
> instances and dropped the other out the load balancer to perform root
> cause analysis.
>
> The dropped host is not not serving any requests but is still maxing
> out the CPU. There are 50 varnish threads running, and ps thread dump
> reveals a single thread spinning at >90%.
>
> root 1235 1 1235 0 1 May03 ? 00:00:00 /bin/bash
> /etc/rc2.d/S20varnishlog-backend start
> root 1236 1235 1236 0 1 May03 ? 00:59:11 varnishlog
> -u -i Backend_health
> root 1240 1235 1240 0 1 May03 ? 00:00:26 logger -t varnishlog
> root 12688 1 12688 0 1 Jul17 ? 00:00:09
> /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f
> /etc/varnish/varnish.vcl -s malloc,6800M
> <snip>
> nobody 13015 12688 21589 0 48 06:30 ? 00:00:00
> /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f
> /etc/varnish/varnish.vcl -s malloc,6800M
> nobody 13015 12688 21611 0 48 06:30 ? 00:00:00
> /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f
> /etc/varnish/varnish.vcl -s malloc,6800M
> nobody 13015 12688 21612 93 48 06:30 ? 03:39:06
> /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f
> /etc/varnish/varnish.vcl -s malloc,6800M
> nobody 13015 12688 21614 0 48 06:30 ? 00:00:00
> /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f
> /etc/varnish/varnish.vcl -s malloc,6800M
> <snip>
>
> Can anyone recommend next steps in terms of dianosing what's going on
> here? I'm at a loss!
>
> Thanks in advance,
>
> Neil Saunders
More information about the varnish-misc
mailing list