varnish crashing

Angelo Höngens a.hongens at netmatch.nl
Sat Jan 23 11:12:32 CET 2010


Hey,

I am having some problems with Varnish. Unfortunately (depends on how
you look at it), I had to replace our Squid cluster with Varnish in a
day.. And now, we are finding out we're having some issues with it,
sometimes Varnish just stops working.

We have 4 balancers, each running FreeBSD 7.2 with 'device carp'
compiled in. I haven't dared upgrade to 8.0 yet, because I had problems
on my testmachine earlier with ipv6 and carp interfaces on 8.0.

[angelo at nmt-nlb-06 ~]$ uname -a
FreeBSD nmt-nlb-06.netmatchcolo1.local 7.2-RELEASE FreeBSD 7.2-RELEASE
#0: Mon Jun 15 19:25:03 CEST 2009
root at nmt-nlb-06.netmatchcolo1.local:/usr/obj/usr/src/sys/NMT-NLB-06  amd64

Here's an example of a varnishd crashing, this is in /var/log/messages:

Jan 23 09:49:39 nmt-nlb-06 varnishd[47478]: Child (47479) not responding
to ping, killing it.
Jan 23 10:49:43 nmt-nlb-06 kernel: pid 47479 (varnishd), uid 80: exited
on signal 3
Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: Child (47479) not responding
to ping, killing it.
Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: Child (47479) not responding
to ping, killing it.
Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: child (54810) Started
Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Pushing vcls failed: CLI
communication error
Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Child (54810) said Closed
fds: 4 5 6 7 11 12 14 15
Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Child (54810) said Child starts
Jan 23 09:51:15 nmt-nlb-06 varnishd[47478]: Child (54810) said managed
to mmap 2319266349056 bytes of 2319266349056
Jan 23 09:51:15 nmt-nlb-06 varnishd[47478]: Child (54810) said Ready

Does anyone know what could cause this?

Some more info below. Thanks in advance for your thoughts..


[angelo at nmt-nlb-06 ~]$ cat /etc/sysctl.conf
net.carp.log=2
net.inet.icmp.icmplim=0
kern.ipc.nmbclusters=65536
kern.ipc.somaxconn=16384
kern.maxfiles=131072
kern.maxfilesperproc=104856
kern.threads.max_threads_per_proc=4096
net.inet.tcp.rfc1323=1
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_auto=1
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_auto=1
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.inflight.enable=0
net.inet.tcp.hostcache.expire=1
net.inet.ip.portrange.first=1024
net.inet.ip.portrange.last=65535
net.inet.ip.portrange.hifirst=49152
net.inet.ip.portrange.hilast=65535

Here's an output from top:

last pid: 56537;  load averages:  0.06,  0.12,  0.13                 up
24+21:37:27  11:09:41
270 processes: 2 running, 268 sleeping
CPU:  2.7% user,  0.0% nice,  1.2% system,  1.5% interrupt, 94.6% idle
Mem: 765M Active, 22M Inact, 293M Wired, 280K Cache, 399M Buf, 6830M Free
Swap: 4096M Total, 78M Used, 4018M Free, 1% Inuse

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
56440 haproxy     1   4    0 27136K 20584K kqread 3   0:23  3.76% haproxy
31203 root        1  44    0 29616K  2328K select 3  11:15  0.00% snmpd
 1912 root        1  44    0 10484K   808K select 0   1:36  0.00% ntpd
 2036 root        1  44    0 83468K 13884K select 0   0:47  0.00% httpd
37934 root        1   4    0 64196K  5392K kqread 0   0:13  0.00% squid
 1815 root        1  44    0  5692K   660K select 0   0:13  0.00% syslogd
 2056 root        1   8    0  6748K   404K nanslp 0   0:05  0.00% cron
 2023 root        1   4    0  5808K   460K kqread 3   0:05  0.00% master
 2031 postfix     1   4    0  5808K   436K kqread 0   0:03  0.00% qmgr
56479 www       218  44    0/('.-..-,(K   726M ucond  3   0:00  0.00%
varnishd
22829 www         1  20    0 82408K   428K lockf  2   0:00  0.00% httpd
38741 www         1  20    0 83468K   432K lockf  2   0:00  0.00% httpd

My config file is too long to post (760kB), because of 350 backends
declarations and 7600 host names pointing to those backends. But I can
make an extract if someone thinks it helps them to understand my issue..

-- 


With kind regards,


Angelo Höngens
systems administrator

MCSE on Windows 2003
MCSE on Windows 2000
MS Small Business Specialist
------------------------------------------
NetMatch
tourism internet software solutions

Ringbaan Oost 2b
5013 CA Tilburg
+31 (0)13 5811088
+31 (0)13 5821239

A.Hongens at netmatch.nl
www.netmatch.nl
------------------------------------------





More information about the varnish-misc mailing list