Varnish restarts when all memory is allocated

Marco Walraven m.walraven at terantula.com
Fri May 29 12:42:40 CEST 2009


On Tue, May 26, 2009 at 11:29:08PM +0200, Marco Walraven wrote:
> Hi,
> 
> We are testing a Varnish Cache in our production environment with a 500Gb storage file and
> 32Gb of RAM. Varnish performance is excellent when all of the 32Gb is not allocated yet.
> The rates I am seeing here are around 40-60Mbit/s, with roughly 2.2M objects in cache and
> hitting a ratio of ~0.65, even then Varnish can handle it easily. However it is still
> warming up since we have a lot of objects that need to be cached.
> 
> The problem I am facing is that as soon as RAM is exhausted Varnish restarts itself.

In the meantime I have been doing some tests tweaking the VM system under Linux, especially
vm.min_free_kbytes, leaving some memory for pdflush and kswapd. The results are slightly better, 
but still varnishd starts to hog the CPU's and restarts. Alternatively we disabled swap, running 
without it. But also tested with a swap file of 16Gb on a different disk. Again slightly better 
results but still the same effect in the end.

We also ran Varnish without the file storage type having just 8Gb assigned with malloc, this ran 
longer than the other tests we did. Varnishd did not crash but got extremely high CPU usages 700% 
and recoverd from that after a minute or 2.

The linux system run with the following sysctl config applied:

Linux varnish001 2.6.18-6-amd64 #1 SMP Tue May 5 08:01:28 UTC 2009 x86_64 GNU/Linux

/etc/systctl.conf

net.ipv4.ip_local_port_range = 1024 65536
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.ipv4.tcp_fin_timeout = 3
net.ipv4.tcp_tw_recycle = 1
net.core.netdev_max_backlog = 30000
net.ipv4.tcp_no_metrics_save=1
net.core.somaxconn = 262144
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
vm.swappiness = 0
vm.min_free_kbytes = 4194304
vm.dirty_background_ratio = 25
vm.dirty_expire_centisecs = 1000
vm.dirty_writeback_centisecs = 100


So, yesterday I installed FreeBSD 7.2 STABLE with the lastest CVSup on the second Varnish box and
ran Varnish with the exact same config as on the Linux box. Exact same setup, 16 Gb swap file, same
arguments for varnishd, same vcl, same amount of traffic, connections etc etc. I did apply 
perfmance tuning as described on the wiki. Both systems ran Ok until the moment there was little 
RAM left. Linux showed the exact same behaviour as before, high CPU load, varnishd with high amounts 
of CPU usages and in the end it varnishd restarted with an ampty cache. FreeBSD kept on going as I 
expected it to work; however with a higher load but still serving images at 60Mbit/s. I did see that
it sometimes needed to recover. Meaning accepting no connections for a few moments and then starting
to go on again, but enough to notice. 

Both systems run varnishd as followed, I changed the amount of buckets to 450011 as opposed to the previous tests I ran. Same for the lru_interval which was 60 and maybe too low.

/usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 400,4000,60 -s file,500G -p obj_workspace 8192 -p sess_workspace 262144 -p lru_interval 600 -h classic,450011 -p sess_timeout 2 -p listen_depth 8192 -p log_hashstring off -p shm_workspace 32768 -p ping_interval 10 -p srcaddr_ttl 0 -p esi_syntax 1

Below some output of Linux when it started to hog the CPU and output of the FreeBSD system 15 minutes
later when it was still going. 

So is this kind of setup actually possible ? And if so how to get it running smoothly ? So far FreeBSD
comes pretty close but not yet there. 

Thanks for the help,

Marco



Linux:

Hitrate ratio:       10      100     1000
Hitrate avg:     0.6914   0.6843   0.6656

        5739         0.00         0.60 Client connections accepted
     5197130         0.00       542.72 Client requests received
     2929389         0.00       305.91 Cache hits
           0         0.00         0.00 Cache hits for pass
     2267089         0.00       236.75 Cache misses
     2267116         0.00       236.75 Backend connections success
           0         0.00         0.00 Backend connections failures
     2246281         0.00       234.57 Backend connections reuses
     2246300         1.00       234.58 Backend connections recycles
         120          .            .   N struct sess_mem
          57          .            .   N struct sess
     2224133          .            .   N struct object
     2222930          .            .   N struct objecthead
     4448208          .            .   N struct smf
           0          .            .   N small free smf
           0          .            .   N large free smf
          33          .            .   N struct vbe_conn
         114          .            .   N struct bereq
         400          .            .   N worker threads
         400         0.00         0.04 N worker threads created
        1601         0.00         0.17 N worker threads limited
           0         0.00         0.00 N queued work requests
           0         0.00         0.00 N overflowed work requests
           2          .            .   N backends
       42991          .            .   N expired objects
     1347513          .            .   N LRU moved objects
     5081074         1.00       530.61 Objects sent with write
        5725         0.00         0.60 Total Sessions
     5197083         1.00       542.72 Total Requests
           6         0.00         0.00 Total pipe
          21         0.00         0.00 Total pass
     2267086         1.00       236.75 Total fetch
  1859166711       366.91    194148.57 Total header bytes
 21933375988       802.81   2290452.80 Total body bytes
        2363         1.00         0.25 Session Closed
     5194871         0.00       542.49 Session herd
   283299668        45.99     29584.34 SHM records
     7496210         3.00       782.81 SHM writes
        5065         0.00         0.53 SHM flushes due to overflow
         142         0.00         0.01 SHM MTX contention
         122         0.00         0.01 SHM cycles through buffer
     4534821         0.00       473.56 allocator requests
     4448207          .            .   outstanding allocations
 37913182208          .            .   bytes allocated
498957729792          .            .   bytes free
         605         0.00         0.06 SMS allocator requests
      261420          .            .   SMS bytes allocated
      261420          .            .   SMS bytes freed
     2267110         0.00       236.75 Backend requests made
           1         0.00         0.00 N vcl total
           1         0.00         0.00 N vcl available
           1          .            .   N total active purges
           1         0.00         0.00 N new purges added

top - 18:56:21 up 1 day,  1:41,  3 users,  load average: 10.72, 2.88, 1.34
Tasks: 121 total,   5 running, 116 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.7%us, 87.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 12.3%si,  0.0%st
Cpu1  :  0.3%us, 99.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu2  :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.3%us, 99.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32942712k total, 30712396k used,  2230316k free,      308k buffers
Swap: 16777208k total,    10692k used, 16766516k free, 28230576k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                    
11740 nobody    18   0  505g  28g  26g S  753 91.8  17:36.41 varnishd                                   
  249 root      20  -5     0    0    0 R   46  0.0  13:32.07 kswapd0                                    
12564 root      15   0  6628 1208  864 R    0  0.0   0:04.45 top                                        
    1 root      15   0  6120   80   48 S    0  0.0   0:14.35 init                                       
    2 root      RT   0     0    0    0 S    0  0.0   0:00.01 migration/0                                
    3 root      34  19     0    0    0 S    0  0.0   0:00.00 ksoftirqd/0       

FreeBSD:

last pid: 12964;  load averages:  2.11,  1.63,  0.94                            up 0+04:15:40  17:09:45
487 processes: 9 running, 462 sleeping, 16 waiting
CPU 0:  2.0% user,  0.0% nice, 25.5% system,  0.7% interrupt, 71.8% idle
CPU 1:  0.7% user,  0.0% nice, 27.5% system,  0.0% interrupt, 71.8% idle
CPU 2:  1.3% user,  0.0% nice, 65.1% system,  0.0% interrupt, 33.6% idle
CPU 3:  1.3% user,  0.0% nice, 36.9% system,  0.0% interrupt, 61.7% idle
CPU 4:  0.7% user,  0.0% nice, 12.8% system,  0.7% interrupt, 85.8% idle
CPU 5:  0.0% user,  0.0% nice,  0.7% system,  0.0% interrupt, 99.3% idle
CPU 6:  0.0% user,  0.0% nice,  2.7% system,  0.0% interrupt, 97.3% idle
CPU 7:  0.7% user,  0.0% nice,  0.7% system,  0.0% interrupt, 98.6% idle
Mem: 27G Active, 2865M Inact, 727M Wired, 884M Cache, 399M Buf, 120K Free
Swap: 16G Total, 1476K Used, 16G Free

  PID    UID    THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
   12      0      1 171 ki31     0K    16K CPU6   6 252:38 89.99% idle: cpu6
   11      0      1 171 ki31     0K    16K CPU7   7 252:34 88.57% idle: cpu7
   14      0      1 171 ki31     0K    16K CPU4   4 252:21 88.18% idle: cpu4
   13      0      1 171 ki31     0K    16K CPU5   5 252:30 83.06% idle: cpu5
   45      0      1 -16    -     0K    16K psleep 1   5:18 81.88% pagedaemon
   15      0      1 171 ki31     0K    16K CPU3   3 250:19 75.88% idle: cpu3
   18      0      1 171 ki31     0K    16K CPU0   0 242:32 73.00% idle: cpu0
   16      0      1 171 ki31     0K    16K RUN    2 245:26 70.90% idle: cpu2
   17      0      1 171 ki31     0K    16K CPU1   1 243:08 60.89% idle: cpu1
12033     80    406  45    0   502G 24339M ucond  4   0:08 13.96% varnishd

Hitrate ratio:       10      100     1000
Hitrate avg:     0.6868   0.6891   0.6749

       15694         2.00         1.51 Client connections accepted
     5611010       651.39       540.72 Client requests received
     3211214       451.89       309.45 Cache hits
           1         0.00         0.00 Cache hits for pass
     2399137       198.51       231.20 Cache misses
     2399143       198.51       231.20 Backend connections success
     2377367       198.51       229.10 Backend connections reuses
     2377408       197.51       229.10 Backend connections recycles
           0          .            .   N struct srcaddr
           0          .            .   N active struct srcaddr
         316          .            .   N struct sess_mem
          48          .            .   N struct sess
     2349612          .            .   N struct object
     2348460          .            .   N struct objecthead
     4699173          .            .   N struct smf
           0          .            .   N small free smf
           0          .            .   N large free smf
          40          .            .   N struct vbe_conn
         115          .            .   N struct bereq
         400          .            .   N worker threads
         400         0.00         0.04 N worker threads created
           0         0.00         0.00 N worker threads limited
           0         0.00         0.00 N queued work requests
           0         0.00         0.00 N overflowed work requests
           2          .            .   N backends
       49564          .            .   N expired objects
     1516192          .            .   N LRU moved objects
     5486698       643.41       528.74 Objects sent with write
       15693         1.00         1.51 Total Sessions
     5611037       648.40       540.72 Total Requests
           8         0.00         0.00 Total pipe
          20         0.00         0.00 Total pass
     2399127       198.51       231.20 Total fetch
  2007812574    234091.74    193486.80 Total header bytes
 23594716103   2781420.12   2273751.19 Total body bytes
        2576         0.00         0.25 Session Closed
     5608555       647.40       540.48 Session herd
   304052678     32276.41     29300.63 SHM records
     8074243       868.86       778.09 SHM writes
        6122         2.99         0.59 SHM flushes due to overflow
       11697         5.99         1.13 SHM MTX contention
         131         0.00         0.01 SHM cycles through buffer
     4798894       396.02       462.45 allocator requests
     4699172          .            .   outstanding allocations
          20         0.00         0.00 Total pass
     2399127       198.51       231.20 Total fetch
  2007812574    234091.74    193486.80 Total header bytes
 23594716103   2781420.12   2273751.19 Total body bytes
        2576         0.00         0.25 Session Closed
     5608555       647.40       540.48 Session herd
   304052678     32276.41     29300.63 SHM records
     8074243       868.86       778.09 SHM writes
        6122         2.99         0.59 SHM flushes due to overflow
       11697         5.99         1.13 SHM MTX contention
         131         0.00         0.01 SHM cycles through buffer
     4798894       396.02       462.45 allocator requests
     4699172          .            .   outstanding allocations
 40178368512          .            .   bytes allocated
496692543488          .            .   bytes free
         575         0.00         0.06 SMS allocator requests
           0          .            .   SMS outstanding allocations
      249035          .            .   SMS bytes allocated
      249035          .            .   SMS bytes freed
     2399128       198.51       231.20 Backend requests made
           1         0.00         0.00 N vcl total
           1         0.00         0.00 N vcl available
           1          .            .   N total active purges
           1         0.00         0.00 N new purges added

-- 
 Terantula - Industrial Strength Open Source
 phone:+31 64 3232 400 / www: http://www.terantula.com / pgpkey: E7EE7A46
 pgp fingerprint: F2EE 122D 964C DE68 7380 6F95 3710 7719 E7EE 7A46 
 apt-get install freebsd



More information about the varnish-misc mailing list