[Varnish] #897: sess_mem "leak" on hyper-threaded cpu
Varnish
varnish-bugs at varnish-cache.org
Tue Apr 12 00:21:49 CEST 2011
#897: sess_mem "leak" on hyper-threaded cpu
-------------------------------------------------+--------------------------
Reporter: askalski | Type: defect
Status: new | Priority: normal
Milestone: | Component: build
Version: trunk | Severity: major
Keywords: sess_mem leak n_sess race condition |
-------------------------------------------------+--------------------------
Comment(by askalski):
I did some testing to figure out how this bug relates to abnormally high
memory usage on production varnish servers (the original issue that got me
looking into this.) I generated synthetic load against a varnishd
(malloc,64M) running on my laptop (Ubuntu 10.10, kernel 2.6.35), until the
n_sess_mem counter maxed out at 100000. Varnish memory usage reached 2GB
resident.
{{{
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
nobody 23388 5.7 52.7 8369944 2032540 ? Sl 15:07 9:24
/usr/sbin/varnishd \
-P /var/run/varnishd.pid -a :6081 -T localhost:6082 \
-f /etc/varnish/default.vcl -S /etc/varnish/secret \
-s malloc,64M
}}}
I then ran a program which attached to the varnishd process via ptrace,
scanned its mapped memory for sess_mem objects (magic number 0x555859c5),
then tallied up the number of dirty pages belonging to each (scanning 17
pages per struct: 3288 byte sess_mem/http + 65536 bytes workspace). The
scanner found all 100000 of the structs. (Note: The 64MB of SMA-cached
data had all expired by the time I was able to scan the process memory; --
I used short TTL's.)
{{{
sess_mem_count = 100000
dirty_pages = 504430
memory_used = 2017720kB (1970 MB)
}}}
The reason that so much session workspace memory was dirtied was not
because of my generated request/response load (they relatively little of
the workspace.) Rather, it was because the workspaces were allocated, by
malloc, to memory locations that had previously been dirtied by the SMA
stevedore (objects that either had expired, or were LRU-evicted.)
I have a few production machines where the varnishd is using 6.5GB over
what SMA was configured to use. Unfortunately, I cannot perform the same
analysis on the memory of those processes, because ptrace() causes the
varnish child to exit on RHEL5/2.6.18 (the poll() in CLS_Poll returns
EINTR, which is interpreted as an error. Kernel 2.6.24+ is able to
restart sys_poll() without userspace intervention.)
Here's the post-mortem varnishstat:
{{{
client_conn 3846502 362.50 Client connections accepted
client_drop 96 0.01 Connection dropped, no sess/wrk
client_req 3813266 359.37 Client requests received
cache_hit 0 0.00 Cache hits
cache_hitpass 0 0.00 Cache hits for pass
cache_miss 185362 17.47 Cache misses
backend_conn 1833 0.17 Backend conn. success
backend_unhealthy 0 0.00 Backend conn. not attempted
backend_busy 0 0.00 Backend conn. too many
backend_fail 0 0.00 Backend conn. failures
backend_reuse 183098 17.26 Backend conn. reuses
backend_toolate 0 0.00 Backend conn. was closed
backend_recycle 183527 17.30 Backend conn. recycles
backend_unused 0 0.00 Backend conn. unused
fetch_head 0 0.00 Fetch head
fetch_length 0 0.00 Fetch with Length
fetch_chunked 185362 17.47 Fetch chunked
fetch_eof 0 0.00 Fetch EOF
fetch_bad 0 0.00 Fetch had bad headers
fetch_close 0 0.00 Fetch wanted close
fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed
fetch_zero 0 0.00 Fetch zero len
fetch_failed 0 0.00 Fetch failed
n_sess_mem 100000 . N struct sess_mem
n_sess 100442 . N struct sess
n_object 0 . N struct object
n_vampireobject 0 . N unresurrected objects
n_objectcore 3 . N struct objectcore
n_objecthead 3 . N struct objecthead
n_smf 0 . N struct smf
n_smf_frag 0 . N small free smf
n_smf_large 0 . N large free smf
n_vbe_conn 1 . N struct vbe_conn
n_wrk 10 . N worker threads
n_wrk_create 502 0.05 N worker threads created
n_wrk_failed 0 0.00 N worker threads not created
n_wrk_max 1829 0.17 N worker threads limited
n_wrk_queue 0 0.00 N queued work requests
n_wrk_overflow 94955 8.95 N overflowed work requests
n_wrk_drop 303 0.03 N dropped work requests
n_backend 1 . N backends
n_expired 1001 . N expired objects
n_lru_nuked 184361 . N LRU nuked objects
n_lru_saved 0 . N LRU saved objects
n_lru_moved 1 . N LRU moved objects
n_deathrow 0 . N objects on deathrow
losthdr 0 0.00 HTTP header overflows
n_objsendfile 0 0.00 Objects sent with sendfile
n_objwrite 3808607 358.93 Objects sent with write
n_objoverflow 0 0.00 Objects overflowing workspace
s_sess 3846406 362.49 Total Sessions
s_req 3813266 359.37 Total Requests
s_pipe 0 0.00 Total pipe
s_pass 0 0.00 Total pass
s_fetch 185362 17.47 Total fetch
s_hdrbytes 877580576 82704.79 Total header bytes
s_bodybytes 13580918034 1279890.49 Total body bytes
sess_closed 3831452 361.08 Session Closed
sess_pipeline 0 0.00 Session Pipeline
sess_readahead 0 0.00 Session Read Ahead
sess_linger 185362 17.47 Session Linger
sess_herd 411446 38.78 Session herd
shm_records 129034970 12160.49 SHM records
shm_writes 16142937 1521.34 SHM writes
shm_flushes 25 0.00 SHM flushes due to overflow
shm_cont 1069825 100.82 SHM MTX contention
shm_cycles 37 0.00 SHM cycles through buffer
sm_nreq 0 0.00 allocator requests
sm_nobj 0 . outstanding allocations
sm_balloc 0 . bytes allocated
sm_bfree 0 . bytes free
sma_nreq 555085 52.31 SMA allocator requests
sma_nobj 0 . SMA outstanding allocations
sma_nbytes 0 . SMA outstanding bytes
sma_balloc 24435160288 . SMA bytes allocated
sma_bfree 24435160288 . SMA bytes free
sms_nreq 3627904 341.90 SMS allocator requests
sms_nobj 0 . SMS outstanding allocations
sms_nbytes 0 . SMS outstanding bytes
sms_balloc 1443905792 . SMS bytes allocated
sms_bfree 1443905792 . SMS bytes freed
backend_req 184726 17.41 Backend requests made
n_vcl 1 0.00 N vcl total
n_vcl_avail 1 0.00 N vcl available
n_vcl_discard 0 0.00 N vcl discarded
n_purge 1 . N total active purges
n_purge_add 1 0.00 N new purges added
n_purge_retire 0 0.00 N old purges deleted
n_purge_obj_test 0 0.00 N objects tested
n_purge_re_test 0 0.00 N regexps tested against
n_purge_dups 0 0.00 N duplicate purges removed
hcb_nolock 183710 17.31 HCB Lookups without lock
hcb_lock 185362 17.47 HCB Lookups with lock
hcb_insert 185360 17.47 HCB Inserts
esi_parse 0 0.00 Objects ESI parsed (unlock)
esi_errors 0 0.00 ESI parse errors (unlock)
accept_fail 0 0.00 Accept failures
client_drop_late 207 0.02 Connection dropped late
uptime 10611 1.00 Client uptime
backend_retry 0 0.00 Backend conn. retry
dir_dns_lookups 0 0.00 DNS director lookups
dir_dns_failed 0 0.00 DNS director failed lookups
dir_dns_hit 0 0.00 DNS director cached lookups hit
dir_dns_cache_full 0 0.00 DNS director full dnscache
fetch_1xx 0 0.00 Fetch no body (1xx)
fetch_204 0 0.00 Fetch no body (204)
fetch_304 0 0.00 Fetch no body (304)
}}}
--
Ticket URL: <http://varnish-cache.org/trac/ticket/897#comment:2>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list