Varnish crashing excessivly
Erik Jansson
erik.jansson at bilddagboken.se
Wed Sep 10 10:47:35 CEST 2008
We've been having problems with squid for a while, so we started trying
varnish in our development environment.
Everything has been working out well. But after lots of problems with
squid last week we decided to give the 2.0 beta a test drive in the
production environment. At first everything seemed to be working out great.
Our cache machine is a Dell Poweredge 2950 X5355 @ 2.66GHz with about
16GB's of RAM with 5 76GB SAS-drives. We are running Ubuntu. The server
is serving images from the backend storage servers. The image sizes
varys from about 10k to 200k. At peak hours we have about 4000 requests
per second.
I compiled varnish with no options. Used this config file.
backend images01 {
.host = "x.x.x.x";
.port = "80";
}
backend images02 {
.host = "x.x.x.x";
.port = "80";
}
backend lighty {
.host = "x.x.x.x";
.port = "80";
}
sub vcl_recv {
set req.grace = 30s;
if (req.request != "GET") {
error 507 "Method not allowed";
}
if(req.url ~ "^/((imgs)|(stat)|(b))/") {
set req.backend = lighty;
lookup;
} else if (req.url ~ "^/(([0-9]{1,2}/)|(avs))(.*)\.jpg$" ){
if(req.url ~
"^/(([0-9]{1})|([1]{1}[0-9]{1})|([2]{1}[0-8]{1})|(avs))/") {
set req.backend = images01;
} else if (req.url ~
"^/(([2]{1}[9]{1})|([3]{1}[0-9]{1})|([4]{1}[0-2]{1}))/") {
set req.backend = images02;
} else {
error 508 "Storage not found";
}
} else {
error 404 "Not Found";
}
if (req.http.host ~ "^xy.com$") {
set req.http.host = "x.se";
lookup;
} else {
set req.http.host = "x.se";
if(req.http.Cookie ~ "viewer=ok") {
lookup;
} else {
error 506 "Please visit x.se to view this image";
}
}
}
sub vcl_fetch {
set req.grace = 30s;
if (!obj.cacheable) {
pass;
}
if (obj.http.Set-Cookie) {
pass;
}
set obj.prefetch = -30s;
deliver;
}
I Started varnish with the following options:
ulimit -n 500000
/usr/local/sbin/varnishd -a x.x.x.x:80 \
-f /usr/local/etc/varnish/raptor.vcl \
-T 127.0.0.1:2000 \
-s file,/mnt/cache1/varnish_storage1.bin,80% \
-s file,/mnt/cache2/varnish_storage2.bin,80% \
-s file,/mnt/cache3/varnish_storage3.bin,80% \
-s file,/mnt/cache4/varnish_storage4.bin,80% \
-s file,/mnt/cache5/varnish_storage5.bin,80% \
-p thread_pool_max=4000 \
-p listen_depth=4096 \
-p lru_interval=3600 \
-h classic,800011 \
-t 600
Then we ran in to problems, when watching thru varnishstat the server
seemed to stop every now and then (every ~30 seconds). We soon
established that it had to do with disk writes hogging up all resources.
We've tried tweaking some of the /proc/vm-variables but with no luck so far.
Problem #2 is that varnish segfaults every now and then, sometimes many
times in a short period, but sometimes it runs for a couple of days
without problems. The only lead i have on this is
[105474.200474] varnishd[24170]: segfault at 00000000000004a0 rip
000000000041ced0 rsp 00002aef65046af8 error 4
which i got from dmesg
We switched over to -s malloc,300G and formatted the disks to swap, and
added them with the same priority
It ran pretty well for a while, but then the segfaults began again. And
when it didn't segfault it hogged up sys %.
But the VM-hangs were gone.
I tried running with -d -d but didn't get any info about the sys %
hogging. The symptom was a pretty unresponsive cache-server during high
loads.
I am aware that this is probably not even close to all the information
you need, I need your help to collect more data about my problems. I
would really like to replace squid as soon as possible.
More information about the varnish-bugs
mailing list