Varnish stuck on most served content

Traian Bratucu traian.bratucu at eea.europa.eu
Wed Mar 30 10:59:08 CEST 2011


Not sure what you mean by "freeze", but what you need to do is debug the request with "varnishlog". 
You need to see what exactly happens when the GET request is received by varnish and whether it is served from cache or varnish tries to fetch from the backends.

Try " varnishlog -o | grep -A 50 'your.css' " (or something like that) on one of the varnish servers.

Traian

-----Original Message-----
From: varnish-misc-bounces at varnish-cache.org [mailto:varnish-misc-bounces at varnish-cache.org] On Behalf Of Diego Roccia
Sent: Wednesday, March 30, 2011 10:51 AM
To: varnish-misc at varnish-cache.org
Subject: Varnish stuck on most served content

Hi Guys,
   This is my first message in this list, I began working for a new company some months ago and I found this infrastructure:

+---------+  +---------+  +---------+  +---------+
| VARNISH |  | VARNISH |  | VARNISH |  | VARNISH |
+---------+  +---------+  +---------+  +---------+
      |            |            |            |
      +------------+------------+------------+
                   |            |
            +------+-+       +--+-----+
            | APACHE |       | APACHE |
            +--------+       +--------+

Varnish servers are HP DL360 G6 with 66Gb RAM and 4 Quad-Core Xeon CPUs, running varnish 2.1.6 (Updated from 2.0.5 1 month ago). They're serving content for up to 450Mbit/s during peaks.

It's happening often that they freeze serving contents. and I noticed a common pattern: the content that get stuck is always one of the most served, like a css or js file, or some component of the page layout, and it never happens to an image part of the content.

It's really weird, because css should be always cached.

I'm running Centos 5.5 64bit and here's my varnish startup parameters:

DAEMON_OPTS=" -a ${VARNISH_LISTEN_ADDRESS}:${VARNISH_LISTEN_PORT} \
		-f ${VARNISH_VCL_CONF} \
		-T 0.0.0.0:6082 \
		-t 604800 \
		-u varnish -g varnish \
		-s malloc,54G \
		-p thread_pool_add_delay=2 \
                 -p thread_pools=16 \
                 -p thread_pool_min=50 \
                 -p thread_pool_max=4000 \
		-p listen_depth=4096 \
		-p lru_interval=600 \
		-hclassic,500009 \
		-p log_hashstring=off \
		-p shm_workspace=16384 \
		-p ping_interval=2 \
		-p default_grace=3600 \
		-p pipe_timeout=10 \
		-p sess_timeout=6 \
		-p send_timeout=10"

In attach there is my vcl and the varnishstat -1 output after a 24h run of 1 of the servers. Do you notice something bad?

In the meanwhile I'm running through the documentation, but it's for us an high priority issue as we're talking about the production environment and there's not time now to wait for me to completely understand how does varnish work and find out a solution.

Hope someone can help me
Thanks in advance
Diego








More information about the varnish-misc mailing list