Varnish CPU thrashing: Failing to serve requests
Neil Saunders
n.j.saunders at gmail.com
Thu Jul 19 11:34:19 CEST 2012
Hi -
We run 3 Varnish instance in EC2 behind Load Balancer - This setup has
performed solidly since it's installation.
This morning I came in to find that 2 of the 3 instances thrashing the
CPU and failing to serve requests. Our monitoring shows that there was
a notable (20%) CPU steal in availability zones A & B starting at
6:30, but I note that this has also occurred in the past and has not
caused us any issues previously. We've restarted one of the problem
instances and dropped the other out the load balancer to perform root
cause analysis.
The dropped host is not not serving any requests but is still maxing
out the CPU. There are 50 varnish threads running, and ps thread dump
reveals a single thread spinning at >90%.
root 1235 1 1235 0 1 May03 ? 00:00:00 /bin/bash
/etc/rc2.d/S20varnishlog-backend start
root 1236 1235 1236 0 1 May03 ? 00:59:11 varnishlog
-u -i Backend_health
root 1240 1235 1240 0 1 May03 ? 00:00:26 logger -t varnishlog
root 12688 1 12688 0 1 Jul17 ? 00:00:09
/usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f
/etc/varnish/varnish.vcl -s malloc,6800M
<snip>
nobody 13015 12688 21589 0 48 06:30 ? 00:00:00
/usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f
/etc/varnish/varnish.vcl -s malloc,6800M
nobody 13015 12688 21611 0 48 06:30 ? 00:00:00
/usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f
/etc/varnish/varnish.vcl -s malloc,6800M
nobody 13015 12688 21612 93 48 06:30 ? 03:39:06
/usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f
/etc/varnish/varnish.vcl -s malloc,6800M
nobody 13015 12688 21614 0 48 06:30 ? 00:00:00
/usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -T :8082 -f
/etc/varnish/varnish.vcl -s malloc,6800M
<snip>
Can anyone recommend next steps in terms of dianosing what's going on
here? I'm at a loss!
Thanks in advance,
Neil Saunders
More information about the varnish-misc
mailing list