Tracking down sporadic 503 errors

Justin Pasher justinp at
Tue Mar 30 17:02:19 CEST 2010


I'm currently running Varnish r4633 (I can upgrade if absolutely 
necessary), and we've been receiving very sporadic 503 errors listed in 
the log files generated by varnishncsa. It's a very small percentage, 
but nonetheless, when it happens on an important page, it's noticeable.

Stats from yesterday's log file show 116 "503" errors out of about 4.1 
million hits. About 80% of the failed requests are POST requests, which 
it setup in my VCL as a "pass through". If I look in the apache logs 
(the backend server), I only see one 503 error returned by apache 
itself, so maybe there's a timeout issue somewhere. I'm trying to figure 
out the best way to troubleshoot this, since it's too inconsistent to 
just sit watching the output of varnishlog.

Perhaps it's hitting the default value for between_bytes_timeout (60 
seconds)? If processing the data on the backend takes too long, then 
varnish would time out after 60 seconds of no data, even if the backend 
is still churning, right? I guess the question is what situations cause 
Varnish to return a 503 aside from when the backend itself returns a 503.

I can post details of my VCL if needed, but it's pretty simple (mostly 
taken from examples on the site).

Justin Pasher

