[Varnish] #845: Health checks get duplicated when loading a new config
Varnish
varnish-bugs at varnish-cache.org
Thu Jan 13 16:17:52 CET 2011
#845: Health checks get duplicated when loading a new config
------------------------------------+---------------------------------------
Reporter: johnnyh | Type: defect
Status: new | Priority: normal
Milestone: | Component: varnishd
Version: trunk | Severity: normal
Keywords: health check duplicate |
------------------------------------+---------------------------------------
Summary: When you reload a config, and the do vcl.discard on the old
config, the health checks sometimes get broken. It is related to report
ticket 834 ( http://www.varnish-cache.org/trac/ticket/834 ) , but the
final word on 834 was that doing a vcl.discard fixes the problem.
Apparently it does not do so all the time. A workaround for this issue is
to restart varnish, but that is a really nasty solution because the cache
gets flushed and it is also not as 'safe' as just doing a 'reload' of
Varnish.
Now first some system details:
Varnish v2.1.3
Intel(R) Xeon(R) CPUX5670 @ 2.93GHz (In a VMware virtual machine)
64-bit
4G RAM
Linux kernel 2.6.18-194.26.1.el5
RHEL 5.5 completely up-to-date
Custom VCL (described below)
Here is how to try and reproduce it:
- Start with a working config, with a single backend, with a health check
that returns 'healthy' all the time.
- Now change the IP address of the backend to something that is certainly
not a healthy backend, like 1.2.3.4.
- Load the new config and start using it:
# DATE=`date +%s` ; varnishadm -T 127.0.0.1:6082 vcl.load reload${DATE}
/etc/varnish/test-config.vcl; varnishadm -T 127.0.0.1:6082 vcl.use
reload${DATE}
- We now have a situation where Varnishlog shows our backend as healthy
and sick at the same time:
# varnishlog
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.012475
0.012311 HTTP/1.1 200 OK
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.008161
0.011274 HTTP/1.1 200 OK
0 Backend_health - test_site Still sick ------- 0 4 5 0.000000
0.000000
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.011735
0.011389 HTTP/1.1 200 OK
- Let's check what configs varnish thinks it knows about:
# varnishadm -T 127.0.0.1:6082 vcl.list
available 105 boot
active 1 reload1294928381
- According to ticket 834 we must now discard the old configurations,
which is only one in this case:
# varnishadm -T 127.0.0.1:6081 vcl.discard boot
- The problem now exists here: Sometimes, the discarded configuration does
not 'disappear' from the list of available configurations, but it remains
there in the state 'discarded'
# varnishadm -T 127.0.0.1:6082 vcl.list
discarded 105 boot
active 1 reload1294928381
- The real problem lies here: The backend checks are now kaput.
Varnishlogs shows the backend as healthy and sick at the same time.
# varnishlog
0 Backend_health - test_site Still sick ------- 0 4 5 0.000000
0.000000
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.007989
0.010539 HTTP/1.1 200 OK
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.014861
0.011620 HTTP/1.1 200 OK
0 Backend_health - test_site Still sick ------- 0 4 5 0.000000
0.000000
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.011745
0.011651 HTTP/1.1 200 OK
- There appears to be no way to fix this situation other than restarting
Varnishd.
- I have been able to reproduce this problem a few times, but not
consistently. It seems this problem shows up when you use vcl.load-vcl
.use-vcl.discard in rapid succession. If you work really slowly while
doing the reload/discard cycle, you will probably not find this bug. The
way I reload and discard my configs, is by having the following script in
my init.d script, so that all I have to do is call "/etc/init.d/varnish
reload ; /etc/init.d/varnish discard". Here is the code I use in the init
script:
vcl_reload() {
echo "Reloading Varnish VCL..."
DATE=`date +%s`
varnishadm -T $HOSTPORT vcl.load reload${DATE} $VARNISH_VCL_CONF ||
vcl_exit 1 "Error compiling config $VARNISH_VCL_CONF"
varnishadm -T $HOSTPORT vcl.use reload${DATE} || vcl_exit 1 "Error
loading config $VARNISH_VCL_CONF"
vcl_exit 0 "VCL reloaded succesfuly."
}
vcl_discard_all() {
echo "Discarding old configurations..."
COUNT=`varnishadm -T $HOSTPORT vcl.list | grep -v ^$ | grep active -B1
| wc -l`
if [ $COUNT -le 1 ] ; then vcl_exit 1 "Error: There are no old
configurations to discard." ; fi
varnishadm -T $HOSTPORT vcl.list | grep -v ^$ | while read CONFIG ; do
if [ `echo "$CONFIG" | awk '{print $1}'` == "available" ] ; then
varnishadm -T $HOSTPORT vcl.discard `echo "$CONFIG" | awk
'{print $3}'`
else
break
fi
done
vcl_exit 0 "Old configurations were succesfully discarded."
}
--
Ticket URL: <http://www.varnish-cache.org/trac/ticket/845>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list