Varnish hangs / requests time out
ross at trademe.co.nz
Fri Mar 6 03:17:34 CET 2009
> Have you checked dmesg? Do you have any estimate of how simultaneous these freezes are? (seconds, minutes or tens of minutes apart for instance?).
There was nothing in dmesg to indicate any sort of problem at the time of the incident. We weren't actively watching over them when the failures happened - but the Syslog messages from the load balancer say both failures occurred less than one second apart.
> Your hit rate is quite low (78%ish) and it doesn't seem like you have grace enabled, which I strongly recommend.
> If dmesg doesn't reveal any troubles, I'd start by setting up grace (req.grace = 30s; and obj.grace = 30s; will
> get you far) and focusing on getting that hit rate up. If all you're serving is images, chances are that you should
> be able to top 99% which would make Varnish considerably more resilient to hiccoughs from backends.
OK I have added those grace settings. I'd be surprised if we ever get near 99% though, since images are being added to the backend all the time and the set of live images changes over time as auctions expire.
> You should also consider starting with -p cli_timeout=20 or similar, as the default can be far too aggressive on a busy site.
Since making the changes above, we have noticed that the number of worker threads has already peaked at 264 and been quite choppy (when viewing the data in Cacti) - previously the peak was 133 and a lot smoother. I'd have imagined the cli_timeout change would have made things smoother?
We're also running one of the servers in debug mode to catch any error messages. Will report back what we get if we see the freeze again.
Thanks for the suggestions.
More information about the varnish-misc