[Varnish] #845: Health checks get duplicated when loading a new config
Varnish
varnish-bugs at varnish-cache.org
Mon Jan 17 11:38:04 CET 2011
#845: Health checks get duplicated when loading a new config
----------------------+-----------------------------------------------------
Reporter: johnnyh | Owner: martin
Type: defect | Status: new
Priority: normal | Milestone:
Component: varnishd | Version: trunk
Severity: normal | Keywords: health check duplicate
----------------------+-----------------------------------------------------
Changes (by martin):
* owner: => martin
Old description:
> Summary: When you reload a config, and the do vcl.discard on the old
> config, the health checks sometimes get broken. It is related to report
> ticket 834 ( http://www.varnish-cache.org/trac/ticket/834 ) , but the
> final word on 834 was that doing a vcl.discard fixes the problem.
> Apparently it does not do so all the time. A workaround for this issue is
> to restart varnish, but that is a really nasty solution because the cache
> gets flushed and it is also not as 'safe' as just doing a 'reload' of
> Varnish.
>
> Now first some system details:
>
> Varnish v2.1.3
> Intel(R) Xeon(R) CPUX5670 @ 2.93GHz (In a VMware virtual machine)
> 64-bit
> 4G RAM
> Linux kernel 2.6.18-194.26.1.el5
> RHEL 5.5 completely up-to-date
> Custom VCL (described below)
>
> Here is how to try and reproduce it:
>
> - Start with a working config, with a single backend, with a health check
> that returns 'healthy' all the time.
> - Now change the IP address of the backend to something that is certainly
> not a healthy backend, like 1.2.3.4.
> - Load the new config and start using it:
>
> # DATE=`date +%s` ; varnishadm -T 127.0.0.1:6082 vcl.load reload${DATE}
> /etc/varnish/test-config.vcl; varnishadm -T 127.0.0.1:6082 vcl.use
> reload${DATE}
>
> - We now have a situation where Varnishlog shows our backend as healthy
> and sick at the same time:
>
> # varnishlog
> 0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.012475
> 0.012311 HTTP/1.1 200 OK
> 0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.008161
> 0.011274 HTTP/1.1 200 OK
> 0 Backend_health - test_site Still sick ------- 0 4 5 0.000000
> 0.000000
> 0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.011735
> 0.011389 HTTP/1.1 200 OK
>
> - Let's check what configs varnish thinks it knows about:
>
> # varnishadm -T 127.0.0.1:6082 vcl.list
> available 105 boot
> active 1 reload1294928381
>
> - According to ticket 834 we must now discard the old configurations,
> which is only one in this case:
>
> # varnishadm -T 127.0.0.1:6081 vcl.discard boot
>
> - The problem now exists here: Sometimes, the discarded configuration
> does not 'disappear' from the list of available configurations, but it
> remains there in the state 'discarded'
>
> # varnishadm -T 127.0.0.1:6082 vcl.list
> discarded 105 boot
> active 1 reload1294928381
>
> - The real problem lies here: The backend checks are now kaput.
> Varnishlogs shows the backend as healthy and sick at the same time.
>
> # varnishlog
> 0 Backend_health - test_site Still sick ------- 0 4 5 0.000000
> 0.000000
> 0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.007989
> 0.010539 HTTP/1.1 200 OK
> 0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.014861
> 0.011620 HTTP/1.1 200 OK
> 0 Backend_health - test_site Still sick ------- 0 4 5 0.000000
> 0.000000
> 0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.011745
> 0.011651 HTTP/1.1 200 OK
>
> - There appears to be no way to fix this situation other than restarting
> Varnishd.
>
> - I have been able to reproduce this problem a few times, but not
> consistently. It seems this problem shows up when you use vcl.load-vcl
> .use-vcl.discard in rapid succession. If you work really slowly while
> doing the reload/discard cycle, you will probably not find this bug. The
> way I reload and discard my configs, is by having the following script in
> my init.d script, so that all I have to do is call "/etc/init.d/varnish
> reload ; /etc/init.d/varnish discard". Here is the code I use in the init
> script:
>
> vcl_reload() {
> echo "Reloading Varnish VCL..."
> DATE=`date +%s`
> varnishadm -T $HOSTPORT vcl.load reload${DATE} $VARNISH_VCL_CONF ||
> vcl_exit 1 "Error compiling config $VARNISH_VCL_CONF"
> varnishadm -T $HOSTPORT vcl.use reload${DATE} || vcl_exit 1 "Error
> loading config $VARNISH_VCL_CONF"
> vcl_exit 0 "VCL reloaded succesfuly."
> }
>
> vcl_discard_all() {
> echo "Discarding old configurations..."
> COUNT=`varnishadm -T $HOSTPORT vcl.list | grep -v ^$ | grep active
> -B1 | wc -l`
> if [ $COUNT -le 1 ] ; then vcl_exit 1 "Error: There are no old
> configurations to discard." ; fi
> varnishadm -T $HOSTPORT vcl.list | grep -v ^$ | while read CONFIG ;
> do
> if [ `echo "$CONFIG" | awk '{print $1}'` == "available" ] ; then
> varnishadm -T $HOSTPORT vcl.discard `echo "$CONFIG" | awk
> '{print $3}'`
> else
> break
> fi
> done
> vcl_exit 0 "Old configurations were succesfully discarded."
> }
New description:
Summary: When you reload a config, and the do vcl.discard on the old
config, the health checks sometimes get broken. It is related to report
ticket 834 ( http://www.varnish-cache.org/trac/ticket/834 ) , but the
final word on 834 was that doing a vcl.discard fixes the problem.
Apparently it does not do so all the time. A workaround for this issue is
to restart varnish, but that is a really nasty solution because the cache
gets flushed and it is also not as 'safe' as just doing a 'reload' of
Varnish.
Now first some system details:
Varnish v2.1.3
Intel(R) Xeon(R) CPUX5670 @ 2.93GHz (In a VMware virtual machine)
64-bit
4G RAM
Linux kernel 2.6.18-194.26.1.el5
RHEL 5.5 completely up-to-date
Custom VCL (described below)
Here is how to try and reproduce it:
- Start with a working config, with a single backend, with a health check
that returns 'healthy' all the time.
- Now change the IP address of the backend to something that is certainly
not a healthy backend, like 1.2.3.4.
- Load the new config and start using it:
# DATE=`date +%s` ; varnishadm -T 127.0.0.1:6082 vcl.load reload${DATE}
/etc/varnish/test-config.vcl; varnishadm -T 127.0.0.1:6082 vcl.use
reload${DATE}
- We now have a situation where Varnishlog shows our backend as healthy
and sick at the same time:
# varnishlog
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.012475
0.012311 HTTP/1.1 200 OK
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.008161
0.011274 HTTP/1.1 200 OK
0 Backend_health - test_site Still sick ------- 0 4 5 0.000000
0.000000
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.011735
0.011389 HTTP/1.1 200 OK
- Let's check what configs varnish thinks it knows about:
# varnishadm -T 127.0.0.1:6082 vcl.list
available 105 boot
active 1 reload1294928381
- According to ticket 834 we must now discard the old configurations,
which is only one in this case:
# varnishadm -T 127.0.0.1:6081 vcl.discard boot
- The problem now exists here: Sometimes, the discarded configuration does
not 'disappear' from the list of available configurations, but it remains
there in the state 'discarded'
# varnishadm -T 127.0.0.1:6082 vcl.list
discarded 105 boot
active 1 reload1294928381
- The real problem lies here: The backend checks are now kaput.
Varnishlogs shows the backend as healthy and sick at the same time.
# varnishlog
0 Backend_health - test_site Still sick ------- 0 4 5 0.000000
0.000000
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.007989
0.010539 HTTP/1.1 200 OK
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.014861
0.011620 HTTP/1.1 200 OK
0 Backend_health - test_site Still sick ------- 0 4 5 0.000000
0.000000
0 Backend_health - test_site Still healthy 4--X-RH 5 4 5 0.011745
0.011651 HTTP/1.1 200 OK
- There appears to be no way to fix this situation other than restarting
Varnishd.
- I have been able to reproduce this problem a few times, but not
consistently. It seems this problem shows up when you use vcl.load-vcl
.use-vcl.discard in rapid succession. If you work really slowly while
doing the reload/discard cycle, you will probably not find this bug. The
way I reload and discard my configs, is by having the following script in
my init.d script, so that all I have to do is call "/etc/init.d/varnish
reload ; /etc/init.d/varnish discard". Here is the code I use in the init
script:
vcl_reload() {
echo "Reloading Varnish VCL..."
DATE=`date +%s`
varnishadm -T $HOSTPORT vcl.load reload${DATE} $VARNISH_VCL_CONF ||
vcl_exit 1 "Error compiling config $VARNISH_VCL_CONF"
varnishadm -T $HOSTPORT vcl.use reload${DATE} || vcl_exit 1 "Error
loading config $VARNISH_VCL_CONF"
vcl_exit 0 "VCL reloaded succesfuly."
}
vcl_discard_all() {
echo "Discarding old configurations..."
COUNT=`varnishadm -T $HOSTPORT vcl.list | grep -v ^$ | grep active -B1
| wc -l`
if [ $COUNT -le 1 ] ; then vcl_exit 1 "Error: There are no old
configurations to discard." ; fi
varnishadm -T $HOSTPORT vcl.list | grep -v ^$ | while read CONFIG ; do
if [ `echo "$CONFIG" | awk '{print $1}'` == "available" ] ; then
varnishadm -T $HOSTPORT vcl.discard `echo "$CONFIG" | awk
'{print $3}'`
else
break
fi
done
vcl_exit 0 "Old configurations were succesfully discarded."
}
--
--
Ticket URL: <http://www.varnish-cache.org/trac/ticket/845#comment:2>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list