varnish crashes

Sun Jan 24 16:23:17 CET 2010

On 23-1-2010 20:57, Michael Fischer wrote:
> On Sat, Jan 23, 2010 at 2:20 AM, Angelo Höngens <a.hongens at netmatch.nl
> <mailto:a.hongens at netmatch.nl>> wrote:
> 
> 
>     (second try, I found out I was subscribed using a wrong email address)
> 
>     Hey,
> 
>     I am having some problems with Varnish. Unfortunately (depends on how
>     you look at it), I had to replace our Squid cluster with Varnish in a
>     day.. And now, we are finding out we're having some issues with it,
>     sometimes Varnish just stops working.
> 
>     We have 4 balancers, each running FreeBSD 7.2 with 'device carp'
>     compiled in. I haven't dared upgrade to 8.0 yet, because I had problems
>     on my testmachine earlier with ipv6 and carp interfaces on 8.0.
> 
>     [angelo at nmt-nlb-06 ~]$ uname -a
>     FreeBSD nmt-nlb-06.netmatchcolo1.local 7.2-RELEASE FreeBSD 7.2-RELEASE
>     #0: Mon Jun 15 19:25:03 CEST 2009
>     root at nmt-nlb-06.netmatchcolo1.local:/usr/obj/usr/src/sys/NMT-NLB-06
>      amd64
> 
>     Here's an example of a varnishd crashing, this is in /var/log/messages:
> 
>     Jan 23 09:49:39 nmt-nlb-06 varnishd[47478]: Child (47479) not responding
>     to ping, killing it.
>     Jan 23 10:49:43 nmt-nlb-06 kernel: pid 47479 (varnishd), uid 80: exited
>     on signal 3
>     Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: Child (47479) not responding
>     to ping, killing it.
>     Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: Child (47479) not responding
>     to ping, killing it.
>     Jan 23 09:49:43 nmt-nlb-06 varnishd[47478]: child (54810) Started
>     Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Pushing vcls failed: CLI
>     communication error
>     Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Child (54810) said Closed
>     fds: 4 5 6 7 11 12 14 15
>     Jan 23 09:49:48 nmt-nlb-06 varnishd[47478]: Child (54810) said Child
>     starts
>     Jan 23 09:51:15 nmt-nlb-06 varnishd[47478]: Child (54810) said managed
>     to mmap 2319266349056 bytes of 2319266349056
>     Jan 23 09:51:15 nmt-nlb-06 varnishd[47478]: Child (54810) said Ready
> 
>     Does anyone know what could cause this?
> 
> 
> What is thread_pool_max set to?  Have you tried lowering it?   We have
> found that on systems with very high cache-hit ratios, 16 threads per
> CPU is the sweet spot to avoid context-switch saturation.

[angelo at nmt-nlb-03 ~]$ varnishadm -T localhost:81 param.show| grep
thread_pool

thread_pool_add_delay      20 [milliseconds]
thread_pool_add_threshold  2 [requests]
thread_pool_fail_delay     200 [milliseconds]
thread_pool_max            500 [threads]
thread_pool_min            5 [threads]
thread_pool_purge_delay    1000 [milliseconds]
thread_pool_timeout        300 [seconds]
thread_pools               2 [pools]

Thread_pool_max is set to 500 threads.. But I just increased it to 4000
(as per http://varnish.projects.linpro.no/wiki/Performance), as 'top'
shows me it's using around 480~490 threads now..

You suggest lowering it, what would be the effect of that? I would think
it would run out of threads or something? Well, we'll see what happens
with the increased threads..

I've also just increased thread_pools from 2 to 4.. (4 cores).

-- 

With kind regards,

Angelo Höngens
systems administrator

MCSE on Windows 2003
MCSE on Windows 2000
MS Small Business Specialist
------------------------------------------
NetMatch
tourism internet software solutions

Ringbaan Oost 2b
5013 CA Tilburg
+31 (0)13 5811088
+31 (0)13 5821239

A.Hongens at netmatch.nl
www.netmatch.nl
------------------------------------------