Varnish restarts when all memory is allocated
Marco Walraven
m.walraven at terantula.com
Fri May 29 12:42:40 CEST 2009
On Tue, May 26, 2009 at 11:29:08PM +0200, Marco Walraven wrote:
> Hi,
>
> We are testing a Varnish Cache in our production environment with a 500Gb storage file and
> 32Gb of RAM. Varnish performance is excellent when all of the 32Gb is not allocated yet.
> The rates I am seeing here are around 40-60Mbit/s, with roughly 2.2M objects in cache and
> hitting a ratio of ~0.65, even then Varnish can handle it easily. However it is still
> warming up since we have a lot of objects that need to be cached.
>
> The problem I am facing is that as soon as RAM is exhausted Varnish restarts itself.
In the meantime I have been doing some tests tweaking the VM system under Linux, especially
vm.min_free_kbytes, leaving some memory for pdflush and kswapd. The results are slightly better,
but still varnishd starts to hog the CPU's and restarts. Alternatively we disabled swap, running
without it. But also tested with a swap file of 16Gb on a different disk. Again slightly better
results but still the same effect in the end.
We also ran Varnish without the file storage type having just 8Gb assigned with malloc, this ran
longer than the other tests we did. Varnishd did not crash but got extremely high CPU usages 700%
and recoverd from that after a minute or 2.
The linux system run with the following sysctl config applied:
Linux varnish001 2.6.18-6-amd64 #1 SMP Tue May 5 08:01:28 UTC 2009 x86_64 GNU/Linux
/etc/systctl.conf
net.ipv4.ip_local_port_range = 1024 65536
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.ipv4.tcp_fin_timeout = 3
net.ipv4.tcp_tw_recycle = 1
net.core.netdev_max_backlog = 30000
net.ipv4.tcp_no_metrics_save=1
net.core.somaxconn = 262144
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
vm.swappiness = 0
vm.min_free_kbytes = 4194304
vm.dirty_background_ratio = 25
vm.dirty_expire_centisecs = 1000
vm.dirty_writeback_centisecs = 100
So, yesterday I installed FreeBSD 7.2 STABLE with the lastest CVSup on the second Varnish box and
ran Varnish with the exact same config as on the Linux box. Exact same setup, 16 Gb swap file, same
arguments for varnishd, same vcl, same amount of traffic, connections etc etc. I did apply
perfmance tuning as described on the wiki. Both systems ran Ok until the moment there was little
RAM left. Linux showed the exact same behaviour as before, high CPU load, varnishd with high amounts
of CPU usages and in the end it varnishd restarted with an ampty cache. FreeBSD kept on going as I
expected it to work; however with a higher load but still serving images at 60Mbit/s. I did see that
it sometimes needed to recover. Meaning accepting no connections for a few moments and then starting
to go on again, but enough to notice.
Both systems run varnishd as followed, I changed the amount of buckets to 450011 as opposed to the previous tests I ran. Same for the lru_interval which was 60 and maybe too low.
/usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -f /etc/varnish/default.vcl -T 127.0.0.1:6082 -t 3600 -w 400,4000,60 -s file,500G -p obj_workspace 8192 -p sess_workspace 262144 -p lru_interval 600 -h classic,450011 -p sess_timeout 2 -p listen_depth 8192 -p log_hashstring off -p shm_workspace 32768 -p ping_interval 10 -p srcaddr_ttl 0 -p esi_syntax 1
Below some output of Linux when it started to hog the CPU and output of the FreeBSD system 15 minutes
later when it was still going.
So is this kind of setup actually possible ? And if so how to get it running smoothly ? So far FreeBSD
comes pretty close but not yet there.
Thanks for the help,
Marco
Linux:
Hitrate ratio: 10 100 1000
Hitrate avg: 0.6914 0.6843 0.6656
5739 0.00 0.60 Client connections accepted
5197130 0.00 542.72 Client requests received
2929389 0.00 305.91 Cache hits
0 0.00 0.00 Cache hits for pass
2267089 0.00 236.75 Cache misses
2267116 0.00 236.75 Backend connections success
0 0.00 0.00 Backend connections failures
2246281 0.00 234.57 Backend connections reuses
2246300 1.00 234.58 Backend connections recycles
120 . . N struct sess_mem
57 . . N struct sess
2224133 . . N struct object
2222930 . . N struct objecthead
4448208 . . N struct smf
0 . . N small free smf
0 . . N large free smf
33 . . N struct vbe_conn
114 . . N struct bereq
400 . . N worker threads
400 0.00 0.04 N worker threads created
1601 0.00 0.17 N worker threads limited
0 0.00 0.00 N queued work requests
0 0.00 0.00 N overflowed work requests
2 . . N backends
42991 . . N expired objects
1347513 . . N LRU moved objects
5081074 1.00 530.61 Objects sent with write
5725 0.00 0.60 Total Sessions
5197083 1.00 542.72 Total Requests
6 0.00 0.00 Total pipe
21 0.00 0.00 Total pass
2267086 1.00 236.75 Total fetch
1859166711 366.91 194148.57 Total header bytes
21933375988 802.81 2290452.80 Total body bytes
2363 1.00 0.25 Session Closed
5194871 0.00 542.49 Session herd
283299668 45.99 29584.34 SHM records
7496210 3.00 782.81 SHM writes
5065 0.00 0.53 SHM flushes due to overflow
142 0.00 0.01 SHM MTX contention
122 0.00 0.01 SHM cycles through buffer
4534821 0.00 473.56 allocator requests
4448207 . . outstanding allocations
37913182208 . . bytes allocated
498957729792 . . bytes free
605 0.00 0.06 SMS allocator requests
261420 . . SMS bytes allocated
261420 . . SMS bytes freed
2267110 0.00 236.75 Backend requests made
1 0.00 0.00 N vcl total
1 0.00 0.00 N vcl available
1 . . N total active purges
1 0.00 0.00 N new purges added
top - 18:56:21 up 1 day, 1:41, 3 users, load average: 10.72, 2.88, 1.34
Tasks: 121 total, 5 running, 116 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.7%us, 87.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 12.3%si, 0.0%st
Cpu1 : 0.3%us, 99.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu2 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 0.3%us, 99.7%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.0%us,100.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32942712k total, 30712396k used, 2230316k free, 308k buffers
Swap: 16777208k total, 10692k used, 16766516k free, 28230576k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11740 nobody 18 0 505g 28g 26g S 753 91.8 17:36.41 varnishd
249 root 20 -5 0 0 0 R 46 0.0 13:32.07 kswapd0
12564 root 15 0 6628 1208 864 R 0 0.0 0:04.45 top
1 root 15 0 6120 80 48 S 0 0.0 0:14.35 init
2 root RT 0 0 0 0 S 0 0.0 0:00.01 migration/0
3 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0
FreeBSD:
last pid: 12964; load averages: 2.11, 1.63, 0.94 up 0+04:15:40 17:09:45
487 processes: 9 running, 462 sleeping, 16 waiting
CPU 0: 2.0% user, 0.0% nice, 25.5% system, 0.7% interrupt, 71.8% idle
CPU 1: 0.7% user, 0.0% nice, 27.5% system, 0.0% interrupt, 71.8% idle
CPU 2: 1.3% user, 0.0% nice, 65.1% system, 0.0% interrupt, 33.6% idle
CPU 3: 1.3% user, 0.0% nice, 36.9% system, 0.0% interrupt, 61.7% idle
CPU 4: 0.7% user, 0.0% nice, 12.8% system, 0.7% interrupt, 85.8% idle
CPU 5: 0.0% user, 0.0% nice, 0.7% system, 0.0% interrupt, 99.3% idle
CPU 6: 0.0% user, 0.0% nice, 2.7% system, 0.0% interrupt, 97.3% idle
CPU 7: 0.7% user, 0.0% nice, 0.7% system, 0.0% interrupt, 98.6% idle
Mem: 27G Active, 2865M Inact, 727M Wired, 884M Cache, 399M Buf, 120K Free
Swap: 16G Total, 1476K Used, 16G Free
PID UID THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12 0 1 171 ki31 0K 16K CPU6 6 252:38 89.99% idle: cpu6
11 0 1 171 ki31 0K 16K CPU7 7 252:34 88.57% idle: cpu7
14 0 1 171 ki31 0K 16K CPU4 4 252:21 88.18% idle: cpu4
13 0 1 171 ki31 0K 16K CPU5 5 252:30 83.06% idle: cpu5
45 0 1 -16 - 0K 16K psleep 1 5:18 81.88% pagedaemon
15 0 1 171 ki31 0K 16K CPU3 3 250:19 75.88% idle: cpu3
18 0 1 171 ki31 0K 16K CPU0 0 242:32 73.00% idle: cpu0
16 0 1 171 ki31 0K 16K RUN 2 245:26 70.90% idle: cpu2
17 0 1 171 ki31 0K 16K CPU1 1 243:08 60.89% idle: cpu1
12033 80 406 45 0 502G 24339M ucond 4 0:08 13.96% varnishd
Hitrate ratio: 10 100 1000
Hitrate avg: 0.6868 0.6891 0.6749
15694 2.00 1.51 Client connections accepted
5611010 651.39 540.72 Client requests received
3211214 451.89 309.45 Cache hits
1 0.00 0.00 Cache hits for pass
2399137 198.51 231.20 Cache misses
2399143 198.51 231.20 Backend connections success
2377367 198.51 229.10 Backend connections reuses
2377408 197.51 229.10 Backend connections recycles
0 . . N struct srcaddr
0 . . N active struct srcaddr
316 . . N struct sess_mem
48 . . N struct sess
2349612 . . N struct object
2348460 . . N struct objecthead
4699173 . . N struct smf
0 . . N small free smf
0 . . N large free smf
40 . . N struct vbe_conn
115 . . N struct bereq
400 . . N worker threads
400 0.00 0.04 N worker threads created
0 0.00 0.00 N worker threads limited
0 0.00 0.00 N queued work requests
0 0.00 0.00 N overflowed work requests
2 . . N backends
49564 . . N expired objects
1516192 . . N LRU moved objects
5486698 643.41 528.74 Objects sent with write
15693 1.00 1.51 Total Sessions
5611037 648.40 540.72 Total Requests
8 0.00 0.00 Total pipe
20 0.00 0.00 Total pass
2399127 198.51 231.20 Total fetch
2007812574 234091.74 193486.80 Total header bytes
23594716103 2781420.12 2273751.19 Total body bytes
2576 0.00 0.25 Session Closed
5608555 647.40 540.48 Session herd
304052678 32276.41 29300.63 SHM records
8074243 868.86 778.09 SHM writes
6122 2.99 0.59 SHM flushes due to overflow
11697 5.99 1.13 SHM MTX contention
131 0.00 0.01 SHM cycles through buffer
4798894 396.02 462.45 allocator requests
4699172 . . outstanding allocations
20 0.00 0.00 Total pass
2399127 198.51 231.20 Total fetch
2007812574 234091.74 193486.80 Total header bytes
23594716103 2781420.12 2273751.19 Total body bytes
2576 0.00 0.25 Session Closed
5608555 647.40 540.48 Session herd
304052678 32276.41 29300.63 SHM records
8074243 868.86 778.09 SHM writes
6122 2.99 0.59 SHM flushes due to overflow
11697 5.99 1.13 SHM MTX contention
131 0.00 0.01 SHM cycles through buffer
4798894 396.02 462.45 allocator requests
4699172 . . outstanding allocations
40178368512 . . bytes allocated
496692543488 . . bytes free
575 0.00 0.06 SMS allocator requests
0 . . SMS outstanding allocations
249035 . . SMS bytes allocated
249035 . . SMS bytes freed
2399128 198.51 231.20 Backend requests made
1 0.00 0.00 N vcl total
1 0.00 0.00 N vcl available
1 . . N total active purges
1 0.00 0.00 N new purges added
--
Terantula - Industrial Strength Open Source
phone:+31 64 3232 400 / www: http://www.terantula.com / pgpkey: E7EE7A46
pgp fingerprint: F2EE 122D 964C DE68 7380 6F95 3710 7719 E7EE 7A46
apt-get install freebsd
More information about the varnish-misc
mailing list