[Varnish] #934: Memory leak in varnish_32e40a6ececf4a2ea65830e723c770d1ce261898
Varnish
varnish-bugs at varnish-cache.org
Mon Jun 6 22:50:30 CEST 2011
#934: Memory leak in varnish_32e40a6ececf4a2ea65830e723c770d1ce261898
--------------------+-------------------------------------------------------
Reporter: kdajka | Type: defect
Status: new | Priority: normal
Milestone: | Component: varnishd
Version: trunk | Severity: normal
Keywords: |
--------------------+-------------------------------------------------------
I'm having problems with memory leak in varnish trunk
32e40a6ececf4a2ea65830e723c770d1ce261898. I've been trying to use this
version in production since 30 May. I run varnish on machine with 12GB of
RAM, with 2.1.5 I use malloc 9G. With trunk I'm using only 1G.
Nonetheless, varnish from trunk eats all available memory and swap.
My system: Linux varnishic06 2.6.26-2-amd64 #1 SMP Thu Nov 25 04:30:55 UTC
2010 x86_64 GNU/Linux
{{{
/usr/local/inp/varnish/sbin/varnishd -P /var/tmp/foo.bar.pl_varnishd.pid
-a 123.123.123.123:8084 -i foo.bar_varnishic06 -n foo.bar_varnishic06 -f
/tmp/foo.bar.pl.vcl -T 123.123.123.123:2084 -h classic,20011 -p
thread_pools=4 -p ban_lurker_sleep=0.1 -w 200,4000,2 -t 0 -s malloc,1G -d
}}}
{{{
Child (14378) said INFLATE=-3 (incorrect header check)
[above probably not related, happened few hours earlier, I think
timestamps would be a nice feature in CLI]
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
Child (14378) not responding to CLI, killing it.
[about 10 times more CLI tries to kill 14378]
}}}
Sadly this time varnish couldn't be killed. In past few days even though
varnish ate all memory it oomkiller hadn't been used.
An excerpt from varnishstat:
{{{
0+14:17:24
foo.bar_varnishic06
Hitrate ratio: 10 100 1000
Hitrate avg: 0.9455 0.9519 0.9568
5907834 0.00 114.84 client_conn - Client connections
accepted
11851631 0.00 230.38 client_req - Client requests
received
8536708 0.00 165.94 cache_hit - Cache hits
667843 0.00 12.98 cache_miss - Cache misses
3270479 0.00 63.57 backend_conn - Backend conn.
success
2064 0.00 0.04 fetch_head - Fetch head
2099794 0.00 40.82 fetch_length - Fetch with Length
1108491 0.00 21.55 fetch_chunked - Fetch chunked
59114 0.00 1.15 fetch_close - Fetch wanted close
1 0.00 0.00 fetch_failed - Fetch failed
493 0.00 0.01 fetch_304 - Fetch no body (304)
1213 . . n_sess_mem - N struct sess_mem
1118 . . n_sess - N struct sess
315918 . . n_object - N struct object
316309 . . n_objectcore - N struct objectcore
313676 . . n_objecthead - N struct objecthead
829 . . n_waitinglist - N struct
waitinglist
60 . . n_vbc - N struct vbc
800 . . n_wrk - N worker threads
800 0.00 0.02 n_wrk_create - N worker threads
created
69 0.00 0.00 n_wrk_max - N worker threads
limited
1 0.00 0.00 n_wrk_queued - N queued work
requests
4 . . n_backend - N backends
14117 . . n_expired - N expired objects
334145 . . n_lru_nuked - N LRU nuked objects
5799488 . . n_lru_moved - N LRU moved objects
99 0.00 0.00 losthdr - HTTP header overflows
8908643 0.00 173.17 n_objwrite - Objects sent with
write
5907366 0.00 114.83 s_sess - Total Sessions
11851631 0.00 230.38 s_req - Total Requests
2645843 0.00 51.43 s_pass - Total pass
3269955 0.00 63.56 s_fetch - Total fetch
4292029827 0.00 83431.11 s_hdrbytes - Total header bytes
191349157693 0.00 3719562.20 s_bodybytes - Total body bytes
676208 0.00 13.14 sess_closed - Session Closed
54239 0.00 1.05 sess_pipeline - Session Pipeline
24901 0.00 0.48 sess_readahead - Session Read Ahead
11374083 0.00 221.10 sess_linger - Session Linger
10415564 0.00 202.46 sess_herd - Session herd
686757636 0.00 13349.62 shm_records - SHM records
47062742 0.00 914.83 shm_writes - SHM writes
60835 0.00 1.18 shm_cont - SHM MTX contention
300 0.00 0.01 shm_cycles - SHM cycles through
buffer
45019 0.00 0.88 sms_nreq - SMS allocator requests
0 . . sms_nobj - SMS outstanding
allocations
0 . . sms_nbytes - SMS outstanding bytes
347573810 . . sms_balloc - SMS bytes allocated
347573810 . . sms_bfree - SMS bytes freed
3270357 0.00 63.57 backend_req - Backend requests made
1 0.00 0.00 n_vcl - N vcl total
1 0.00 0.00 n_vcl_avail - N vcl available
3 . . n_ban - N total active bans
10 0.00 0.00 n_ban_add - N new bans added
7 0.00 0.00 n_ban_retire - N old bans deleted
821981 0.00 15.98 n_ban_obj_test - N objects tested
1527363 0.00 29.69 n_ban_re_test - N regexps tested
against
51444 0.00 1.00 uptime - Client uptime
2899745 0.00 56.37 n_gunzip - Gunzip operations
1 0.00 0.00 LCK.sms.creat - Created locks
135057 0.00 2.63 LCK.sms.locks - Lock Operations
2 0.00 0.00 LCK.sma.creat - Created locks
16298412 0.00 316.82 LCK.sma.locks - Lock Operations
20011 0.00 0.39 LCK.hcl.creat - Created locks
18091894 0.00 351.68 LCK.hcl.locks - Lock Operations
1 0.00 0.00 LCK.vcl.creat - Created locks
1341 0.00 0.03 LCK.vcl.locks - Lock Operations
1 0.00 0.00 LCK.stat.creat - Created locks
1213 0.00 0.02 LCK.stat.locks - Lock Operations
1 0.00 0.00 LCK.sessmem.creat - Created locks
}}}
{{{
top - 21:54:05 up 168 days, 12:26, 4 users, load average: 952.29,
871.34, 750.66
Tasks: 112 total, 1 running, 111 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 41.9%sy, 0.0%ni, 9.0%id, 48.9%wa, 0.0%hi, 0.1%si,
0.0%st
Mem: 12332260k total, 12269208k used, 63052k free, 496k buffers
Swap: 6291448k total, 6291448k used, 0k free, 5480k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14378 webadm 20 0 26.0g 11g 564 D 46 97.9 52:08.10 varnishd
26320 root 20 0 48868 1844 1344 D 46 0.0 0:12.44 sshd
220 root 15 -5 0 0 0 D 34 0.0 70:15.74 kswapd0
}}}
Last captured dmesg in attachment (jpg), I had seen few blocked pids in
last days:
{{{
[17434416.864353] INFO: task varnishd:27364 blocked for more than 120
seconds.
[17434416.864385] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[17434416.864430] varnishd D 0000000000000003 0 27364 16201
[17434416.865945] ffff810300fe3c18 0000000000000082 ffff810300fe3bd8
ffffffff80229020
[17434416.865996] ffff810177574f20 ffff81025e082fa0 ffff8101775751a8
0000000101045880
[17434416.866045] 0031e12ab42a464e 00000000ffffffff 0000000000000000
ffffffff8022efec
[17434416.866080] Call Trace:
[17434416.866127] [<ffffffff80229020>] hrtick_start_fair+0xfb/0x144
[17434416.866157] [<ffffffff8022efec>] hrtick_set+0x9e/0xf7
[17434416.866186] [<ffffffff8042a531>] __down_write_nested+0x87/0xa1
[17434416.866215] [<ffffffff8029ff9f>] do_coredump+0x99/0x7c6
[17434416.866243] [<ffffffff8024f7a9>] futex_wait+0x2e9/0x394
[17434416.866271] [<ffffffff80238b75>] current_fs_time+0x1e/0x24
[17434416.866298] [<ffffffff802a14cc>] pipe_write+0x3e4/0x42d
[17434416.866327] [<ffffffff8023e2d6>] __dequeue_signal+0xff/0x15a
[17434416.866355] [<ffffffff8023de77>] recalc_sigpending+0xe/0x38
[17434416.866383] [<ffffffff80240154>] get_signal_to_deliver+0x2fb/0x324
[17434416.866413] [<ffffffff8020b2ae>] do_notify_resume+0xaf/0x7fc
[17434416.869429] [<ffffffff802506ff>] do_futex+0x81/0x78a
[17434416.869429] [<ffffffff802461a5>] autoremove_wake_function+0x0/0x2e
[17434416.869429] [<ffffffff80250f06>] sys_futex+0xfe/0x11c
[17434416.869429] [<ffffffff8029b910>] vfs_read+0x11e/0x152
[17434416.869429] [<ffffffff8020bf89>] sysret_signal+0x2b/0x45
[17434416.869429] [<ffffffff8020c267>] ptregscall_common+0x67/0xb0
}}}
--
Ticket URL: <http://varnish-cache.org/trac/ticket/934>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list