varnish stopped resonding
Matt Schurenko
MSchurenko at airg.com
Wed Sep 21 19:15:51 CEST 2011
Hi,
I'm running two varnish servers in production (ver 2.1.5). Both are using the same hardware and have the same amount of RAM (48GB). Last night one of the varnish servers stopped responding on port 80. Since we are using HAproxy in front of both varnish servers for load balancing this did not have much effect on our end users. The symptoms of the problem were either a client ( HAproxy, telnet) could not establish a layer 4 connection to varnish or, if a client could establish a connection and issued an HTTP GET, varnish returned nothing, no HTTP headers, nothing.
Running "ps -efL | grep varnish | wc -l" I revelaed that there were ~ 500 varnish threads. I am using the default configuration with regards to threads (max of 500). To me it seemed that when a client tried to connect to varnish there were no thread available to use so the client just hung there until either it or varnish timeout out and disconnected. Unfortunately I didn't have the good sense to capture a "varnishastat -l" after this happened. I was focused on getting the server back to a working state so I ended up restarting varnishd.
Here is my varnishd command line followed by a current "varnishstat -l" (I have set the weight for this server to be lower than the other varnish instance so that the cache can "warm up". There is typically 4 x as much traffic):
/usr/local/sbin/varnishd -s file,/tmp/varnish-cache,60G -T 127.0.0.1:2000 -a 0.0.0.0:80 -t 604800 -f /usr/local/etc/varnish/default.vcl -p http_headers 384 -p connect_timeout 4.0
client_conn 4985179 120.45 Client connections accepted
client_drop 0 0.00 Connection dropped, no sess/wrk
client_req 4907077 118.56 Client requests received
cache_hit 3356368 81.09 Cache hits
cache_hitpass 0 0.00 Cache hits for pass
cache_miss 1550606 37.46 Cache misses
backend_conn 1530014 36.97 Backend conn. success
backend_unhealthy 0 0.00 Backend conn. not attempted
backend_busy 0 0.00 Backend conn. too many
backend_fail 0 0.00 Backend conn. failures
backend_reuse 20690 0.50 Backend conn. reuses
backend_toolate 0 0.00 Backend conn. was closed
backend_recycle 20691 0.50 Backend conn. recycles
backend_unused 0 0.00 Backend conn. unused
fetch_head 1 0.00 Fetch head
fetch_length 33270 0.80 Fetch with Length
fetch_chunked 1517362 36.66 Fetch chunked
fetch_eof 0 0.00 Fetch EOF
fetch_bad 0 0.00 Fetch had bad headers
fetch_close 70 0.00 Fetch wanted close
fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed
fetch_zero 0 0.00 Fetch zero len
fetch_failed 0 0.00 Fetch failed
n_sess_mem 262 . N struct sess_mem
n_sess 68 . N struct sess
n_object 1550439 . N struct object
n_vampireobject 0 . N unresurrected objects
n_objectcore 1550458 . N struct objectcore
n_objecthead 1550412 . N struct objecthead
n_smf 3100879 . N struct smf
n_smf_frag 0 . N small free smf
n_smf_large 1 . N large free smf
n_vbe_conn 1 . N struct vbe_conn
n_wrk 29 . N worker threads
n_wrk_create 870 0.02 N worker threads created
n_wrk_failed 0 0.00 N worker threads not created
n_wrk_max 3128 0.08 N worker threads limited
n_wrk_queue 0 0.00 N queued work requests
n_wrk_overflow 4696 0.11 N overflowed work requests
n_wrk_drop 0 0.00 N dropped work requests
n_backend 2 . N backends
n_expired 157 . N expired objects
n_lru_nuked 0 . N LRU nuked objects
n_lru_saved 0 . N LRU saved objects
n_lru_moved 3077705 . N LRU moved objects
n_deathrow 0 . N objects on deathrow
losthdr 0 0.00 HTTP header overflows
n_objsendfile 0 0.00 Objects sent with sendfile
n_objwrite 4817364 116.39 Objects sent with write
n_objoverflow 0 0.00 Objects overflowing workspace
s_sess 4985176 120.45 Total Sessions
s_req 4907077 118.56 Total Requests
s_pipe 0 0.00 Total pipe
s_pass 102 0.00 Total pass
s_fetch 1550703 37.47 Total fetch
s_hdrbytes 1590643697 38431.56 Total header bytes
s_bodybytes 17647134982 426372.59 Total body bytes
sess_closed 4522198 109.26 Session Closed
sess_pipeline 4 0.00 Session Pipeline
sess_readahead 8 0.00 Session Read Ahead
sess_linger 469810 11.35 Session Linger
sess_herd 476189 11.51 Session herd
shm_records 297887487 7197.26 SHM records
shm_writes 23469767 567.05 SHM writes
shm_flushes 0 0.00 SHM flushes due to overflow
shm_cont 51830 1.25 SHM MTX contention
shm_cycles 137 0.00 SHM cycles through buffer
sm_nreq 3101298 74.93 allocator requests
sm_nobj 3100878 . outstanding allocations
sm_balloc 13670006784 . bytes allocated
sm_bfree 50754502656 . bytes free
sma_nreq 0 0.00 SMA allocator requests
sma_nobj 0 . SMA outstanding allocations
sma_nbytes 0 . SMA outstanding bytes
sma_balloc 0 . SMA bytes allocated
sma_bfree 0 . SMA bytes free
sms_nreq 5 0.00 SMS allocator requests
sma_nobj 0 . SMA outstanding allocations
sma_nbytes 0 . SMA outstanding bytes
sma_balloc 0 . SMA bytes allocated
sma_bfree 0 . SMA bytes free
sms_nreq 5 0.00 SMS allocator requests
sms_nobj 0 . SMS outstanding allocations
sms_nbytes 0 . SMS outstanding bytes
sms_balloc 2090 . SMS bytes allocated
sms_bfree 2090 . SMS bytes freed
backend_req 1550708 37.47 Backend requests made
n_vcl 1 0.00 N vcl total
n_vcl_avail 1 0.00 N vcl available
n_vcl_discard 0 0.00 N vcl discarded
n_purge 1 . N total active purges
n_purge_add 1 0.00 N new purges added
n_purge_retire 0 0.00 N old purges deleted
n_purge_obj_test 0 0.00 N objects tested
n_purge_re_test 0 0.00 N regexps tested against
n_purge_dups 0 0.00 N duplicate purges removed
hcb_nolock 4906976 118.56 HCB Lookups without lock
hcb_lock 1550518 37.46 HCB Lookups with lock
hcb_insert 1550517 37.46 HCB Inserts
esi_parse 0 0.00 Objects ESI parsed (unlock)
esi_errors 0 0.00 ESI parse errors (unlock)
accept_fail 0 0.00 Accept failures
client_drop_late 0 0.00 Connection dropped late
uptime 41389 1.00 Client uptime
backend_retry 0 0.00 Backend conn. retry
dir_dns_lookups 0 0.00 DNS director lookups
dir_dns_failed 0 0.00 DNS director failed lookups
dir_dns_hit 0 0.00 DNS director cached lookups hit
dir_dns_cache_full 0 0.00 DNS director full dnscache
fetch_1xx 0 0.00 Fetch no body (1xx)
fetch_204 0 0.00 Fetch no body (204)
fetch_304 0 0.00 Fetch no body (304)
Could there be something wrong with my configuration that caused this problem?
Thanks
Matt Schurenko
Systems Administrator
airG(r) Share Your World
Suite 710, 1133 Melville Street
Vancouver, BC V6E 4E5
P: +1.604.408.2228
F: +1.866.874.8136
E: MSchurenko at airg.com
W: www.airg.com<http://www.airg.com>
airG is one of BC's Top 55 Employers and
Canada's Top Employers for Young People
P Please consider the environment before printing this e-mail.
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material communicated under NDA. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20110921/d7e96017/attachment-0003.html>
More information about the varnish-misc
mailing list