Increases in memory usage and dropped connections

Hugo Cisneiros (Eitch) hugo.cisneiros at gmail.com
Tue Aug 23 00:27:53 CEST 2011


Hi,

I use Varnish for at least one year and in the past days began to get
really strange behavior (currently using 2.1.3). Sometimes I get some
"Connection reset by peer" requesting pages. When this happens, I look
at the stats and see something like these, for example:

client_conn           1565694       187.17 Client connections accepted
client_drop               271         0.03 Connection dropped, no sess/wrk
client_req            2790588       333.60 Client requests received
cache_hit             1813984       216.85 Cache hits
cache_hitpass               0         0.00 Cache hits for pass
cache_miss             546088        65.28 Cache misses
backend_conn            32042         3.83 Backend conn. success
backend_unhealthy         1351         0.16 Backend conn. not attempted
backend_busy                0         0.00 Backend conn. too many
backend_fail                1         0.00 Backend conn. failures
backend_reuse          889261       106.31 Backend conn. reuses
backend_toolate           157         0.02 Backend conn. was closed
backend_recycle        889425       106.33 Backend conn. recycles
backend_unused              0         0.00 Backend conn. unused
fetch_head                759         0.09 Fetch head
fetch_length           920288       110.02 Fetch with Length
fetch_chunked               0         0.00 Fetch chunked
fetch_eof                   0         0.00 Fetch EOF
fetch_bad                   0         0.00 Fetch had bad headers
fetch_close                 0         0.00 Fetch wanted close
fetch_oldhttp               0         0.00 Fetch pre HTTP/1.1 closed
fetch_zero                  0         0.00 Fetch zero len
fetch_failed                0         0.00 Fetch failed
n_sess_mem              65783          .   N struct sess_mem
n_sess                  54012          .   N struct sess
n_object               217320          .   N struct object
n_vampireobject             0          .   N unresurrected objects
n_objectcore           218999          .   N struct objectcore
n_objecthead            57686          .   N struct objecthead
n_smf                       0          .   N struct smf
n_smf_frag                  0          .   N small free smf
n_smf_large                 0          .   N large free smf
n_vbe_conn                 28          .   N struct vbe_conn
n_wrk                    2283          .   N worker threads
n_wrk_create             2326         0.28 N worker threads created
n_wrk_failed                0         0.00 N worker threads not created
n_wrk_max                   0         0.00 N worker threads limited
n_wrk_queue                 0         0.00 N queued work requests
n_wrk_overflow          25533         3.05 N overflowed work requests
n_wrk_drop                286         0.03 N dropped work requests
n_backend                  14          .   N backends
n_expired                   0          .   N expired objects
n_lru_nuked            157040          .   N LRU nuked objects
n_lru_saved                 0          .   N LRU saved objects
n_lru_moved             74969          .   N LRU moved objects
n_deathrow                  0          .   N objects on deathrow
losthdr                     0         0.00 HTTP header overflows
n_objsendfile               0         0.00 Objects sent with sendfile
n_objwrite            2494603       298.22 Objects sent with write
n_objoverflow               0         0.00 Objects overflowing workspace
s_sess                1565320       187.13 Total Sessions
s_req                 2790588       333.60 Total Requests
s_pipe                      0         0.00 Total pipe
s_pass                 376511        45.01 Total pass
s_fetch                921039       110.11 Total fetch
s_hdrbytes          844190338    100919.35 Total header bytes
s_bodybytes       40983737008   4899430.60 Total body bytes
sess_closed            125532        15.01 Session Closed
sess_pipeline             788         0.09 Session Pipeline
sess_readahead            810         0.10 Session Read Ahead
sess_linger           2656690       317.60 Session Linger
sess_herd             2874808       343.67 Session herd
shm_records         146433038     17505.44 SHM records
shm_writes           12159673      1453.64 SHM writes
shm_flushes                 0         0.00 SHM flushes due to overflow
shm_cont                51273         6.13 SHM MTX contention
shm_cycles                 69         0.01 SHM cycles through buffer
sm_nreq                     0         0.00 allocator requests
sm_nobj                     0          .   outstanding allocations
sm_balloc                   0          .   bytes allocated
sm_bfree                    0          .   bytes free
sma_nreq              1440966       172.26 SMA allocator requests
sma_nobj               433212          .   SMA outstanding allocations
sma_nbytes         5368687547          .   SMA outstanding bytes
sma_balloc        10635681699          .   SMA bytes allocated
sma_bfree          5266994152          .   SMA bytes free
sms_nreq                 1206         0.14 SMS allocator requests
sms_nobj                    0          .   SMS outstanding allocations
sms_nbytes                  0          .   SMS outstanding bytes
sms_balloc             472992          .   SMS bytes allocated
sms_bfree              472992          .   SMS bytes freed
backend_req            921115       110.12 Backend requests made
n_vcl                       1         0.00 N vcl total
n_vcl_avail                 1         0.00 N vcl available
n_vcl_discard               0         0.00 N vcl discarded
n_purge                     1          .   N total active purges
n_purge_add                 1         0.00 N new purges added
n_purge_retire              0         0.00 N old purges deleted
n_purge_obj_test            0         0.00 N objects tested
n_purge_re_test             0         0.00 N regexps tested against
n_purge_dups                0         0.00 N duplicate purges removed
hcb_nolock                  0         0.00 HCB Lookups without lock
hcb_lock                    0         0.00 HCB Lookups with lock
hcb_insert                  0         0.00 HCB Inserts
esi_parse                   0         0.00 Objects ESI parsed (unlock)
esi_errors                  0         0.00 ESI parse errors (unlock)
accept_fail                 0         0.00 Accept failures
client_drop_late           15         0.00 Connection dropped late
uptime                   8365         1.00 Client uptime

There are lots of worrisome stats like n_wrk (2283), n_wrk_overflow
(25533), n_wrk_drop (286), client_drop (271) ... The number of active
threads (2283) is very high for no apparent reason since the normal
usage is like 300 threads or so.

The server has 7.5GB RAM and is using malloc with 5GB size, running
only a Varnish instance on an Ubuntu Server 10.10. Tried going to 3GB
malloc size, increasing thread_pool_min to 500 with 2 pools (1000 idle
threads total), while max is 5000 (from 3000), session_linger to 50,
100 and 150ms, connection_timeout to 5s, between_bytes_timeout to 5s,
but sometimes the workers go way up crazy and the entire ram plus 3gb
of swap get in use.

Any ideas? Maybe something hanging up the sessions (network latency
between varnish and backend? or someone doing this on purpose?)

Here are my settings:

acceptor_sleep_decay       0.900000 []
acceptor_sleep_incr        0.001000 [s]
acceptor_sleep_max         0.050000 [s]
auto_restart               on [bool]
ban_lurker_sleep           30.000000 [s]
between_bytes_timeout      50.000000 [s]
cache_vbe_conns            off [bool]
cc_command                 "exec cc -fpic -shared -Wl,-x -o %o %s"
cli_buffer                 8192 [bytes]
cli_timeout                25 [seconds]
clock_skew                 10 [s]
connect_timeout            50.000000 [s]
critbit_cooloff            180.000000 [s]
default_grace              10 [seconds]
default_ttl                120 [seconds]
diag_bitmap                0x0 [bitmap]
err_ttl                    0 [seconds]
esi_syntax                 0 [bitmap]
fetch_chunksize            128 [kilobytes]
first_byte_timeout         25.000000 [s]
group                      varnish (112)
http_headers               64 [header lines]
http_range_support         off [bool]
listen_address             :80
listen_depth               8192 [connections]
log_hashstring             off [bool]
log_local_address          off [bool]
lru_interval               60 [seconds]
max_esi_includes           5 [includes]
max_restarts               4 [restarts]
overflow_max               100 [%]
ping_interval              3 [seconds]
pipe_timeout               60 [seconds]
prefer_ipv6                off [bool]
purge_dups                 on [bool]
rush_exponent              3 [requests per request]
saintmode_threshold        10 [objects]
send_timeout               6000 [seconds]
sess_timeout               10 [seconds]
sess_workspace             524288 [bytes]
session_linger             50 [ms]
session_max                100000 [sessions]
shm_reclen                 255 [bytes]
shm_workspace              262144 [bytes]
syslog_cli_traffic         on [bool]
thread_pool_add_delay      2 [milliseconds]
thread_pool_add_threshold  2 [requests]
thread_pool_fail_delay     200 [milliseconds]
thread_pool_max            5000 [threads]
thread_pool_min            500 [threads]
thread_pool_purge_delay    1000 [milliseconds]
thread_pool_stack          unlimited [bytes]
thread_pool_timeout        300 [seconds]
thread_pools               2 [pools]
thread_stats_rate          10 [requests]
user                       varnish (106)
vcl_trace                  off [bool]
waiter                     default (epoll, poll)

-- 
[]'s
Hugo




More information about the varnish-misc mailing list