[Varnish] #934: Memory leak in varnish_32e40a6ececf4a2ea65830e723c770d1ce261898

Varnish varnish-bugs at varnish-cache.org
Mon Jun 6 22:50:30 CEST 2011


#934: Memory leak in varnish_32e40a6ececf4a2ea65830e723c770d1ce261898
--------------------+-------------------------------------------------------
 Reporter:  kdajka  |        Type:  defect  
   Status:  new     |    Priority:  normal  
Milestone:          |   Component:  varnishd
  Version:  trunk   |    Severity:  normal  
 Keywords:          |  
--------------------+-------------------------------------------------------
 I'm having problems with memory leak in varnish trunk
 32e40a6ececf4a2ea65830e723c770d1ce261898. I've been trying to use this
 version in production since 30 May. I run varnish on machine with 12GB of
 RAM, with 2.1.5 I use malloc 9G. With trunk I'm using only 1G.
 Nonetheless, varnish from trunk eats all available memory and swap.

 My system: Linux varnishic06 2.6.26-2-amd64 #1 SMP Thu Nov 25 04:30:55 UTC
 2010 x86_64 GNU/Linux

 {{{
 /usr/local/inp/varnish/sbin/varnishd -P /var/tmp/foo.bar.pl_varnishd.pid
 -a 123.123.123.123:8084 -i foo.bar_varnishic06 -n foo.bar_varnishic06 -f
 /tmp/foo.bar.pl.vcl -T 123.123.123.123:2084 -h classic,20011 -p
 thread_pools=4 -p ban_lurker_sleep=0.1 -w 200,4000,2 -t 0 -s malloc,1G -d
 }}}

 {{{
 Child (14378) said INFLATE=-3 (incorrect header check)
 [above probably not related, happened few hours earlier, I think
 timestamps would be a nice feature in CLI]
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 Child (14378) not responding to CLI, killing it.
 [about 10 times more CLI tries to kill 14378]
 }}}

 Sadly this time varnish couldn't be killed. In past few days even though
 varnish ate all memory it oomkiller hadn't been used.

 An excerpt from varnishstat:
 {{{
 0+14:17:24
 foo.bar_varnishic06
 Hitrate ratio:       10      100     1000
 Hitrate avg:     0.9455   0.9519   0.9568

      5907834         0.00       114.84 client_conn - Client connections
 accepted
     11851631         0.00       230.38 client_req - Client requests
 received
      8536708         0.00       165.94 cache_hit - Cache hits
       667843         0.00        12.98 cache_miss - Cache misses
      3270479         0.00        63.57 backend_conn - Backend conn.
 success
         2064         0.00         0.04 fetch_head - Fetch head
      2099794         0.00        40.82 fetch_length - Fetch with Length
      1108491         0.00        21.55 fetch_chunked - Fetch chunked
        59114         0.00         1.15 fetch_close - Fetch wanted close
            1         0.00         0.00 fetch_failed - Fetch failed
          493         0.00         0.01 fetch_304 - Fetch no body (304)
         1213          .            .   n_sess_mem - N struct sess_mem
         1118          .            .   n_sess - N struct sess
       315918          .            .   n_object - N struct object
       316309          .            .   n_objectcore - N struct objectcore
       313676          .            .   n_objecthead - N struct objecthead
          829          .            .   n_waitinglist - N struct
 waitinglist
           60          .            .   n_vbc - N struct vbc
          800          .            .   n_wrk - N worker threads
          800         0.00         0.02 n_wrk_create - N worker threads
 created
           69         0.00         0.00 n_wrk_max - N worker threads
 limited
            1         0.00         0.00 n_wrk_queued - N queued work
 requests
            4          .            .   n_backend - N backends
        14117          .            .   n_expired - N expired objects
       334145          .            .   n_lru_nuked - N LRU nuked objects
      5799488          .            .   n_lru_moved - N LRU moved objects
           99         0.00         0.00 losthdr - HTTP header overflows
      8908643         0.00       173.17 n_objwrite - Objects sent with
 write
      5907366         0.00       114.83 s_sess - Total Sessions
     11851631         0.00       230.38 s_req - Total Requests
      2645843         0.00        51.43 s_pass - Total pass
      3269955         0.00        63.56 s_fetch - Total fetch
   4292029827         0.00     83431.11 s_hdrbytes - Total header bytes
 191349157693         0.00   3719562.20 s_bodybytes - Total body bytes
       676208         0.00        13.14 sess_closed - Session Closed
        54239         0.00         1.05 sess_pipeline - Session Pipeline
        24901         0.00         0.48 sess_readahead - Session Read Ahead
     11374083         0.00       221.10 sess_linger - Session Linger
     10415564         0.00       202.46 sess_herd - Session herd
    686757636         0.00     13349.62 shm_records - SHM records
     47062742         0.00       914.83 shm_writes - SHM writes
        60835         0.00         1.18 shm_cont - SHM MTX contention
          300         0.00         0.01 shm_cycles - SHM cycles through
 buffer
        45019         0.00         0.88 sms_nreq - SMS allocator requests
            0          .            .   sms_nobj - SMS outstanding
 allocations
            0          .            .   sms_nbytes - SMS outstanding bytes
    347573810          .            .   sms_balloc - SMS bytes allocated
    347573810          .            .   sms_bfree - SMS bytes freed
      3270357         0.00        63.57 backend_req - Backend requests made
            1         0.00         0.00 n_vcl - N vcl total
            1         0.00         0.00 n_vcl_avail - N vcl available
            3          .            .   n_ban - N total active bans
           10         0.00         0.00 n_ban_add - N new bans added
            7         0.00         0.00 n_ban_retire - N old bans deleted
       821981         0.00        15.98 n_ban_obj_test - N objects tested
      1527363         0.00        29.69 n_ban_re_test - N regexps tested
 against
        51444         0.00         1.00 uptime - Client uptime
      2899745         0.00        56.37 n_gunzip - Gunzip operations
            1         0.00         0.00 LCK.sms.creat - Created locks
       135057         0.00         2.63 LCK.sms.locks - Lock Operations
            2         0.00         0.00 LCK.sma.creat - Created locks
     16298412         0.00       316.82 LCK.sma.locks - Lock Operations
        20011         0.00         0.39 LCK.hcl.creat - Created locks
     18091894         0.00       351.68 LCK.hcl.locks - Lock Operations
            1         0.00         0.00 LCK.vcl.creat - Created locks
         1341         0.00         0.03 LCK.vcl.locks - Lock Operations
            1         0.00         0.00 LCK.stat.creat - Created locks
         1213         0.00         0.02 LCK.stat.locks - Lock Operations
            1         0.00         0.00 LCK.sessmem.creat - Created locks
 }}}

 {{{
 top - 21:54:05 up 168 days, 12:26,  4 users,  load average: 952.29,
 871.34, 750.66
 Tasks: 112 total,   1 running, 111 sleeping,   0 stopped,   0 zombie
 Cpu(s):  0.1%us, 41.9%sy,  0.0%ni,  9.0%id, 48.9%wa,  0.0%hi,  0.1%si,
 0.0%st
 Mem:  12332260k total, 12269208k used,    63052k free,      496k buffers
 Swap:  6291448k total,  6291448k used,        0k free,     5480k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 14378 webadm    20   0 26.0g  11g  564 D   46 97.9  52:08.10 varnishd
 26320 root      20   0 48868 1844 1344 D   46  0.0   0:12.44 sshd
   220 root      15  -5     0    0    0 D   34  0.0  70:15.74 kswapd0
 }}}

 Last captured dmesg in attachment (jpg), I had seen few blocked pids in
 last days:
 {{{
 [17434416.864353] INFO: task varnishd:27364 blocked for more than 120
 seconds.
 [17434416.864385] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
 disables this message.
 [17434416.864430] varnishd      D 0000000000000003     0 27364  16201
 [17434416.865945]  ffff810300fe3c18 0000000000000082 ffff810300fe3bd8
 ffffffff80229020
 [17434416.865996]  ffff810177574f20 ffff81025e082fa0 ffff8101775751a8
 0000000101045880
 [17434416.866045]  0031e12ab42a464e 00000000ffffffff 0000000000000000
 ffffffff8022efec
 [17434416.866080] Call Trace:
 [17434416.866127]  [<ffffffff80229020>] hrtick_start_fair+0xfb/0x144
 [17434416.866157]  [<ffffffff8022efec>] hrtick_set+0x9e/0xf7
 [17434416.866186]  [<ffffffff8042a531>] __down_write_nested+0x87/0xa1
 [17434416.866215]  [<ffffffff8029ff9f>] do_coredump+0x99/0x7c6
 [17434416.866243]  [<ffffffff8024f7a9>] futex_wait+0x2e9/0x394
 [17434416.866271]  [<ffffffff80238b75>] current_fs_time+0x1e/0x24
 [17434416.866298]  [<ffffffff802a14cc>] pipe_write+0x3e4/0x42d
 [17434416.866327]  [<ffffffff8023e2d6>] __dequeue_signal+0xff/0x15a
 [17434416.866355]  [<ffffffff8023de77>] recalc_sigpending+0xe/0x38
 [17434416.866383]  [<ffffffff80240154>] get_signal_to_deliver+0x2fb/0x324
 [17434416.866413]  [<ffffffff8020b2ae>] do_notify_resume+0xaf/0x7fc
 [17434416.869429]  [<ffffffff802506ff>] do_futex+0x81/0x78a
 [17434416.869429]  [<ffffffff802461a5>] autoremove_wake_function+0x0/0x2e
 [17434416.869429]  [<ffffffff80250f06>] sys_futex+0xfe/0x11c
 [17434416.869429]  [<ffffffff8029b910>] vfs_read+0x11e/0x152
 [17434416.869429]  [<ffffffff8020bf89>] sysret_signal+0x2b/0x45
 [17434416.869429]  [<ffffffff8020c267>] ptregscall_common+0x67/0xb0
 }}}

-- 
Ticket URL: <http://varnish-cache.org/trac/ticket/934>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator




More information about the varnish-bugs mailing list