Varnish stuck on stresstest/approved by real traffic
Václav Bílek
v.bilek at 1art.cz
Tue Nov 3 11:51:22 CET 2009
Hi
When testing varnish throughput and scalability I have found strange
varnish behavior.
using 2.0.4 with cache_acceptor_epoll.c patch :
http://varnish.projects.linpro.no/ticket/492
When testing number of clients scalability I am able to get varnish
into a state when it stops responding but in varnishlog it looks like
doing some requests from the past ...
detailed :
I start few instances of ab with keepalive on and with -r option (Don't
exit on socket receive errors) with concurency aprox. 2000
"ab -r -k -n 100000 -c 2000 http://127.0.0.1/&"
doesn't mater if from localhost or from other hosts(even distributed
each instance on diferent host); when i run so many instances that
connections on varnish raise aprox above 10K, varnish stucks, and stops
responding.In varnishstat "Client requests received" drops to "0", In
that time varnish starts to spawn new threads, but not dramatically (
for example from 2,4K to 3K ).
Then I stop all the ab instances and try to fetch something ...
/opt/httpd2.2/bin/ab -k -n 1 -c 1 http://127.0.0.1/
but get timeout, when I try after few minutes when varnish certainly
does nothing i get timeout too.
In such state there is in varnishlog every 30s request like this but
nothing else:
10342 SessionOpen c 127.0.0.1 28988 0.0.0.0:80
10342 ReqStart c 127.0.0.1 28988 1170523327
10342 RxRequest c GET
10342 RxURL c /
10342 RxProtocol c HTTP/1.0
10342 RxHeader c Connection: Keep-Alive
10342 RxHeader c Host: 127.0.0.1
10342 RxHeader c User-Agent: ApacheBench/2.3
10342 RxHeader c Accept: */*
10342 VCL_call c recv lookup
10342 VCL_call c hash hash
10342 VCL_call c miss fetch
10342 Backend c 20041 default default
10342 ObjProtocol c HTTP/1.1
10342 ObjStatus c 200
10342 ObjResponse c OK
10342 ObjHeader c Date: Tue, 03 Nov 2009 10:28:33 GMT
10342 ObjHeader c Server: Apache
10342 ObjHeader c Last-Modified: Sat, 20 Nov 2004 20:16:24 GMT
10342 ObjHeader c ETag: "fc76-2c-3e9564c23b600"
10342 ObjHeader c Content-Type: text/html
10342 TTL c 1170523327 RFC 5 1257244113 0 0 0 0
10342 VCL_call c fetch
10342 VCL_info c XID 1170523327: obj.prefetch (-30) less than ttl
(5.00406), ignored.
10342 VCL_return c deliver
10342 Length c 44
10342 VCL_call c deliver deliver
10342 TxProtocol c HTTP/1.1
10342 TxStatus c 200
10342 TxResponse c OK
10342 TxHeader c Server: Apache
10342 TxHeader c Last-Modified: Sat, 20 Nov 2004 20:16:24 GMT
10342 TxHeader c ETag: "fc76-2c-3e9564c23b600"
10342 TxHeader c Content-Type: text/html
10342 TxHeader c Content-Length: 44
10342 TxHeader c Date: Tue, 03 Nov 2009 10:28:33 GMT
10342 TxHeader c X-Varnish: 1170523327
10342 TxHeader c Age: 0
10342 TxHeader c Via: 1.1 varnish
10342 TxHeader c Connection: keep-alive
10342 ReqEnd c 1170523327 1257244113.149692297
1257244113.153866768 526.840385675 0.004138470 0.000036001
There is nothing strange in syslog or varnishlog, the only thing that
helps recover varnish is restarting it.
Bad is that it has already happened in production traffic.
Is there anything in my settings what should i check?
my setings:
accept_fd_holdoff 50 [ms]
acceptor default (epoll, poll)
auto_restart on [bool]
backend_http11 on [bool]
between_bytes_timeout 60.000000 [s]
cache_vbe_conns off [bool]
cc_command "exec cc -fpic -shared -Wl,-x -o %o %s"
cli_buffer 8192 [bytes]
cli_timeout 15 [seconds]
client_http11 off [bool]
clock_skew 10 [s]
connect_timeout 0.400000 [s]
default_grace 10
default_ttl 60 [seconds]
diag_bitmap 0x0 [bitmap]
err_ttl 0 [seconds]
esi_syntax 0 [bitmap]
fetch_chunksize 128 [kilobytes]
first_byte_timeout 60.000000 [s]
group nogroup (65534)
listen_address 0.0.0.0:80
listen_depth 40960 [connections]
log_hashstring off [bool]
log_local_address off [bool]
lru_interval 2 [seconds]
max_esi_includes 5 [includes]
max_restarts 4 [restarts]
obj_workspace 8192 [bytes]
overflow_max 300 [%]
ping_interval 3 [seconds]
pipe_timeout 60 [seconds]
prefer_ipv6 off [bool]
purge_dups off [bool]
purge_hash on [bool]
rush_exponent 3 [requests per request]
send_timeout 5 [seconds]
sess_timeout 1 [seconds]
sess_workspace 16384 [bytes]
session_linger 0 [ms]
shm_reclen 255 [bytes]
shm_workspace 8192 [bytes]
srcaddr_hash 1049 [buckets]
srcaddr_ttl 0 [seconds]
thread_pool_add_delay 2 [milliseconds]
thread_pool_add_threshold 2 [requests]
thread_pool_fail_delay 200 [milliseconds]
thread_pool_max 10000 [threads]
thread_pool_min 200 [threads]
thread_pool_purge_delay 1000 [milliseconds]
thread_pool_timeout 10 [seconds]
thread_pools 12 [pools]
user nobody (65534)
vcl_trace off [bool]
vcl.conf:
just backend declaration nothing else
varnishstat -1 in stuck state:
uptime 3450 . Child uptime
client_conn 620403 179.83 Client connections accepted
client_req 1212084 351.33 Client requests received
cache_hit 1211297 351.10 Cache hits
cache_hitpass 0 0.00 Cache hits for pass
cache_miss 140 0.04 Cache misses
backend_conn 140 0.04 Backend connections success
backend_unhealthy 0 0.00 Backend connections not
attempted
backend_busy 0 0.00 Backend connections too many
backend_fail 0 0.00 Backend connections failures
backend_reuse 0 0.00 Backend connections reuses
backend_recycle 0 0.00 Backend connections recycles
backend_unused 0 0.00 Backend connections unused
n_srcaddr 0 . N struct srcaddr
n_srcaddr_act 0 . N active struct srcaddr
n_sess_mem 20051 . N struct sess_mem
n_sess 20277 . N struct sess
n_object 2431 . N struct object
n_objecthead 2440 . N struct objecthead
n_smf 2442 . N struct smf
n_smf_frag 4 . N small free smf
n_smf_large 16 . N large free smf
n_vbe_conn 0 . N struct vbe_conn
n_bereq 1 . N struct bereq
n_wrk 2467 . N worker threads
n_wrk_create 2477 0.72 N worker threads created
n_wrk_failed 0 0.00 N worker threads not created
n_wrk_max 0 0.00 N worker threads limited
n_wrk_queue 9891 2.87 N queued work requests
n_wrk_overflow 18197 5.27 N overflowed work requests
n_wrk_drop 0 0.00 N dropped work requests
n_backend 1 . N backends
n_expired 139 . N expired objects
n_lru_nuked 0 . N LRU nuked objects
n_lru_saved 0 . N LRU saved objects
n_lru_moved 20 . N LRU moved objects
n_deathrow 0 . N objects on deathrow
losthdr 0 0.00 HTTP header overflows
n_objsendfile 0 0.00 Objects sent with sendfile
n_objwrite 1212216 351.37 Objects sent with write
n_objoverflow 0 0.00 Objects overflowing workspace
s_sess 610512 176.96 Total Sessions
s_req 1212261 351.38 Total Requests
s_pipe 0 0.00 Total pipe
s_pass 0 0.00 Total pass
s_fetch 140 0.04 Total fetch
s_hdrbytes 335018181 97106.72 Total header bytes
s_bodybytes 53341094 15461.19 Total body bytes
sess_closed 600121 173.95 Session Closed
sess_pipeline 0 0.00 Session Pipeline
sess_readahead 0 0.00 Session Read Ahead
sess_linger 0 0.00 Session Linger
sess_herd 612115 177.42 Session herd
shm_records 40013596 11598.14 SHM records
shm_writes 3028734 877.89 SHM writes
shm_flushes 0 0.00 SHM flushes due to overflow
shm_cont 228833 66.33 SHM MTX contention
shm_cycles 10 0.00 SHM cycles through buffer
sm_nreq 2710 0.79 allocator requests
sm_nobj 2422 . outstanding allocations
sm_balloc 19836928 . bytes allocated
sm_bfree 9991118848 . bytes free
sma_nreq 0 0.00 SMA allocator requests
sma_nobj 0 . SMA outstanding allocations
sma_nbytes 0 . SMA outstanding bytes
sma_balloc 0 . SMA bytes allocated
sma_bfree 0 . SMA bytes free
sms_nreq 0 0.00 SMS allocator requests
sms_nobj 0 . SMS outstanding allocations
sms_nbytes 0 . SMS outstanding bytes
sms_balloc 0 . SMS bytes allocated
sms_bfree 0 . SMS bytes freed
backend_req 140 0.04 Backend requests made
n_vcl 1 0.00 N vcl total
n_vcl_avail 1 0.00 N vcl available
n_vcl_discard 0 0.00 N vcl discarded
n_purge 1 . N total active purges
n_purge_add 1 0.00 N new purges added
n_purge_retire 0 0.00 N old purges deleted
n_purge_obj_test 0 0.00 N objects tested
n_purge_re_test 0 0.00 N regexps tested against
n_purge_dups 0 0.00 N duplicate purges removed
hcb_nolock 0 0.00 HCB Lookups without lock
hcb_lock 0 0.00 HCB Lookups with lock
hcb_insert 0 0.00 HCB Inserts
esi_parse 0 0.00 Objects ESI parsed (unlock)
esi_errors 0 0.00 ESI parse errors (unlock)
More information about the varnish-misc
mailing list