Varnish stuck on stresstest/approved by real traffic

Václav Bílek v.bilek at 1art.cz
Tue Nov 3 11:51:22 CET 2009


Hi

When testing varnish throughput and scalability I have found strange
varnish behavior.

using 2.0.4 with cache_acceptor_epoll.c  patch :
http://varnish.projects.linpro.no/ticket/492

When testing number of clients scalability I  am able to get varnish
into a state when it stops responding but in varnishlog it looks like
doing some requests from the past ...
detailed :
I start few instances of ab with keepalive on and with -r option (Don't
exit on socket receive errors) with concurency aprox. 2000
"ab -r -k -n 100000 -c 2000 http://127.0.0.1/&"
doesn't mater if from localhost or from other hosts(even distributed
each instance on diferent host); when i run so many instances that
connections on varnish raise aprox above 10K, varnish stucks, and stops
responding.In varnishstat "Client requests received" drops to "0", In
that time varnish starts to spawn new threads, but not dramatically (
for example from 2,4K to 3K  ).
Then I stop all the ab instances and try to fetch something ...
/opt/httpd2.2/bin/ab  -k  -n 1 -c 1 http://127.0.0.1/
but get timeout, when I try after few minutes when varnish certainly
does nothing i get timeout too.
In such state there is in varnishlog every 30s request like this but
nothing else:
10342 SessionOpen  c 127.0.0.1 28988 0.0.0.0:80
10342 ReqStart     c 127.0.0.1 28988 1170523327
10342 RxRequest    c GET
10342 RxURL        c /
10342 RxProtocol   c HTTP/1.0
10342 RxHeader     c Connection: Keep-Alive
10342 RxHeader     c Host: 127.0.0.1
10342 RxHeader     c User-Agent: ApacheBench/2.3
10342 RxHeader     c Accept: */*
10342 VCL_call     c recv lookup
10342 VCL_call     c hash hash
10342 VCL_call     c miss fetch
10342 Backend      c 20041 default default
10342 ObjProtocol  c HTTP/1.1
10342 ObjStatus    c 200
10342 ObjResponse  c OK
10342 ObjHeader    c Date: Tue, 03 Nov 2009 10:28:33 GMT
10342 ObjHeader    c Server: Apache
10342 ObjHeader    c Last-Modified: Sat, 20 Nov 2004 20:16:24 GMT
10342 ObjHeader    c ETag: "fc76-2c-3e9564c23b600"
10342 ObjHeader    c Content-Type: text/html
10342 TTL          c 1170523327 RFC 5 1257244113 0 0 0 0
10342 VCL_call     c fetch
10342 VCL_info     c XID 1170523327: obj.prefetch (-30) less than ttl
(5.00406), ignored.
10342 VCL_return   c deliver
10342 Length       c 44
10342 VCL_call     c deliver deliver
10342 TxProtocol   c HTTP/1.1
10342 TxStatus     c 200
10342 TxResponse   c OK
10342 TxHeader     c Server: Apache
10342 TxHeader     c Last-Modified: Sat, 20 Nov 2004 20:16:24 GMT
10342 TxHeader     c ETag: "fc76-2c-3e9564c23b600"
10342 TxHeader     c Content-Type: text/html
10342 TxHeader     c Content-Length: 44
10342 TxHeader     c Date: Tue, 03 Nov 2009 10:28:33 GMT
10342 TxHeader     c X-Varnish: 1170523327
10342 TxHeader     c Age: 0
10342 TxHeader     c Via: 1.1 varnish
10342 TxHeader     c Connection: keep-alive
10342 ReqEnd       c 1170523327 1257244113.149692297
1257244113.153866768 526.840385675 0.004138470 0.000036001

There is nothing strange in syslog or varnishlog, the only thing that
helps recover varnish is restarting it.

Bad is that it has already happened  in production traffic.

Is there anything in my settings what should i check?


my setings:
accept_fd_holdoff          50 [ms]
acceptor                   default (epoll, poll)
auto_restart               on [bool]
backend_http11             on [bool]
between_bytes_timeout      60.000000 [s]
cache_vbe_conns            off [bool]
cc_command                 "exec cc -fpic -shared -Wl,-x -o %o %s"
cli_buffer                 8192 [bytes]
cli_timeout                15 [seconds]
client_http11              off [bool]
clock_skew                 10 [s]
connect_timeout            0.400000 [s]
default_grace              10
default_ttl                60 [seconds]
diag_bitmap                0x0 [bitmap]
err_ttl                    0 [seconds]
esi_syntax                 0 [bitmap]
fetch_chunksize            128 [kilobytes]
first_byte_timeout         60.000000 [s]
group                      nogroup (65534)
listen_address             0.0.0.0:80
listen_depth               40960 [connections]
log_hashstring             off [bool]
log_local_address          off [bool]
lru_interval               2 [seconds]
max_esi_includes           5 [includes]
max_restarts               4 [restarts]
obj_workspace              8192 [bytes]
overflow_max               300 [%]
ping_interval              3 [seconds]
pipe_timeout               60 [seconds]
prefer_ipv6                off [bool]
purge_dups                 off [bool]
purge_hash                 on [bool]
rush_exponent              3 [requests per request]
send_timeout               5 [seconds]
sess_timeout               1 [seconds]
sess_workspace             16384 [bytes]
session_linger             0 [ms]
shm_reclen                 255 [bytes]
shm_workspace              8192 [bytes]
srcaddr_hash               1049 [buckets]
srcaddr_ttl                0 [seconds]
thread_pool_add_delay      2 [milliseconds]
thread_pool_add_threshold  2 [requests]
thread_pool_fail_delay     200 [milliseconds]
thread_pool_max            10000 [threads]
thread_pool_min            200 [threads]
thread_pool_purge_delay    1000 [milliseconds]
thread_pool_timeout        10 [seconds]
thread_pools               12 [pools]
user                       nobody (65534)
vcl_trace                  off [bool]

vcl.conf:
just backend declaration nothing else

varnishstat -1 in stuck state:
uptime                   3450          .   Child uptime
client_conn            620403       179.83 Client connections accepted
client_req            1212084       351.33 Client requests received
cache_hit             1211297       351.10 Cache hits
cache_hitpass               0         0.00 Cache hits for pass
cache_miss                140         0.04 Cache misses
backend_conn              140         0.04 Backend connections success
backend_unhealthy            0         0.00 Backend connections not
attempted
backend_busy                0         0.00 Backend connections too many
backend_fail                0         0.00 Backend connections failures
backend_reuse               0         0.00 Backend connections reuses
backend_recycle             0         0.00 Backend connections recycles
backend_unused              0         0.00 Backend connections unused
n_srcaddr                   0          .   N struct srcaddr
n_srcaddr_act               0          .   N active struct srcaddr
n_sess_mem              20051          .   N struct sess_mem
n_sess                  20277          .   N struct sess
n_object                 2431          .   N struct object
n_objecthead             2440          .   N struct objecthead
n_smf                    2442          .   N struct smf
n_smf_frag                  4          .   N small free smf
n_smf_large                16          .   N large free smf
n_vbe_conn                  0          .   N struct vbe_conn
n_bereq                     1          .   N struct bereq
n_wrk                    2467          .   N worker threads
n_wrk_create             2477         0.72 N worker threads created
n_wrk_failed                0         0.00 N worker threads not created
n_wrk_max                   0         0.00 N worker threads limited
n_wrk_queue              9891         2.87 N queued work requests
n_wrk_overflow          18197         5.27 N overflowed work requests
n_wrk_drop                  0         0.00 N dropped work requests
n_backend                   1          .   N backends
n_expired                 139          .   N expired objects
n_lru_nuked                 0          .   N LRU nuked objects
n_lru_saved                 0          .   N LRU saved objects
n_lru_moved                20          .   N LRU moved objects
n_deathrow                  0          .   N objects on deathrow
losthdr                     0         0.00 HTTP header overflows
n_objsendfile               0         0.00 Objects sent with sendfile
n_objwrite            1212216       351.37 Objects sent with write
n_objoverflow               0         0.00 Objects overflowing workspace
s_sess                 610512       176.96 Total Sessions
s_req                 1212261       351.38 Total Requests
s_pipe                      0         0.00 Total pipe
s_pass                      0         0.00 Total pass
s_fetch                   140         0.04 Total fetch
s_hdrbytes          335018181     97106.72 Total header bytes
s_bodybytes          53341094     15461.19 Total body bytes
sess_closed            600121       173.95 Session Closed
sess_pipeline               0         0.00 Session Pipeline
sess_readahead              0         0.00 Session Read Ahead
sess_linger                 0         0.00 Session Linger
sess_herd              612115       177.42 Session herd
shm_records          40013596     11598.14 SHM records
shm_writes            3028734       877.89 SHM writes
shm_flushes                 0         0.00 SHM flushes due to overflow
shm_cont               228833        66.33 SHM MTX contention
shm_cycles                 10         0.00 SHM cycles through buffer
sm_nreq                  2710         0.79 allocator requests
sm_nobj                  2422          .   outstanding allocations
sm_balloc            19836928          .   bytes allocated
sm_bfree           9991118848          .   bytes free
sma_nreq                    0         0.00 SMA allocator requests
sma_nobj                    0          .   SMA outstanding allocations
sma_nbytes                  0          .   SMA outstanding bytes
sma_balloc                  0          .   SMA bytes allocated
sma_bfree                   0          .   SMA bytes free
sms_nreq                    0         0.00 SMS allocator requests
sms_nobj                    0          .   SMS outstanding allocations
sms_nbytes                  0          .   SMS outstanding bytes
sms_balloc                  0          .   SMS bytes allocated
sms_bfree                   0          .   SMS bytes freed
backend_req               140         0.04 Backend requests made
n_vcl                       1         0.00 N vcl total
n_vcl_avail                 1         0.00 N vcl available
n_vcl_discard               0         0.00 N vcl discarded
n_purge                     1          .   N total active purges
n_purge_add                 1         0.00 N new purges added
n_purge_retire              0         0.00 N old purges deleted
n_purge_obj_test            0         0.00 N objects tested
n_purge_re_test             0         0.00 N regexps tested against
n_purge_dups                0         0.00 N duplicate purges removed
hcb_nolock                  0         0.00 HCB Lookups without lock
hcb_lock                    0         0.00 HCB Lookups with lock
hcb_insert                  0         0.00 HCB Inserts
esi_parse                   0         0.00 Objects ESI parsed (unlock)
esi_errors                  0         0.00 ESI parse errors (unlock)






More information about the varnish-misc mailing list