varnish crashes

Sat Jan 23 12:08:32 CET 2010

On 23-1-2010 11:27, Poul-Henning Kamp wrote:
> In message <4B5ACD81.3000903 at netmatch.nl>, =?ISO-8859-1?Q?Angelo_H=F6ngens?= wr
> ites:
> 
>> We have 4 balancers, each running FreeBSD 7.2 with 'device carp'
>> compiled in. I haven't dared upgrade to 8.0 yet, because I had problems
>> on my testmachine earlier with ipv6 and carp interfaces on 8.0.
> 
> It sounds mostly like a resource issue, but I can't say exactly from
> what you have provided.

I get that feeling as well, but I can't seem to find anything wrong.
Thanks for your reaction, I hope you can give me some more pointers..

By the way: the balancers do a total of 2000 req/sec now, but when
stresstesting I can easily get 9000 cache/hits persec. So I don't think
it's hanging on the upper limits of its performance.

Even worse, I just had to reboot one of the balancers, it (almost)
completely locked up. Ping responds, but ssh dies, and the local console
on the machine does not respond either (it does not show any messages).
The machines ran a heavy squid load for over a year, but never hung. Grrr..

> You can consider increasing the "cli_timeout" parameter a bit
> and see if it is simply a matter of a busy machine.

ok, will try.

I now have in my /etc/rc.conf:

varnishd_enable="YES"
varnishd_listen=":80"
varnishd_storage="file,/cache,80%"
varnishd_config="/usr/local/etc/varnish/default.vcl"

I just changed this (after reading the tuning page some more) to:

varnishd_enable="YES"
varnishd_flags="-P /var/run/varnishd.pid -a :80 -T localhost:81 -f
/usr/local/etc/varnish/default.vcl -s file,/cache,80% -u www -g www -p
cli_timeout=30 -p lru_interval=20"

Let's see what happens..

> 
> Are you running on 32 bit or 64 bit machines ?

64-bit..

> Use FreeBSD's gstat to see what your disk-activity is like,
> pay particular attention to the service times (ms/r & ms/w cols)

wow, that's a nice tool, I only knew iostat ;)

The disk system is a gmirror of 2 sata disks, and I see on average it
does 33 iops, with a response time of 8.3ms/r, and 4.2ms/w.. Not really
shocking.

> 
> Also check your varnishlog and varnishstat for signs of trouble...

Did that, everything looks peachy. Varnishlog produces too much output
(and I would not know what to filter), and varnishstat looks ok as well.
Are there specific counters that could indicate trouble?

0+01:16:24
nmt-nlb-04.netmatchcolo1.local
Hitrate ratio:       10      100      199
Hitrate avg:     0.7894   0.8059   0.8054

      362508       133.35        79.08 Client connections accepted
     1807294       366.95       394.26 Client requests received
     1261936       260.68       275.29 Cache hits
      130026        30.08        28.37 Cache hits for pass
      323451        64.17        70.56 Cache misses
      545215       106.28       118.94 Backend conn. success
         995         0.00         0.22 Fetch head
      543052        98.26       118.47 Fetch with Length
         209         0.00         0.05 Fetch chunked
         453         0.00         0.10 Fetch wanted close
        1215          .            .   N struct sess_mem
         510          .            .   N struct sess
      121156          .            .   N struct object
      120051          .            .   N struct objecthead
      242414          .            .   N struct smf
         809          .            .   N small free smf
           1          .            .   N large free smf
         122          .            .   N struct vbe_conn
         438          .            .   N struct bereq
         467          .            .   N worker threads
         699         0.00         0.15 N worker threads created
           0         0.00         0.00 N queued work requests
        8000         0.00         1.75 N overflowed work requests
         317          .            .   N backends
      202676          .            .   N expired objects
      698992          .            .   N LRU moved objects
     1356134       264.69       295.84 Objects sent with write
       47428        38.10        10.35 Total Sessions

-- 

With kind regards,

Angelo Höngens
systems administrator

MCSE on Windows 2003
MCSE on Windows 2000
MS Small Business Specialist
------------------------------------------
NetMatch
tourism internet software solutions

Ringbaan Oost 2b
5013 CA Tilburg
+31 (0)13 5811088
+31 (0)13 5821239

A.Hongens at netmatch.nl
www.netmatch.nl
------------------------------------------