Tracking down sporadic 503 errors

This is what I have seen as well. I could not pin it down. What made it 
somewhat better is adding

     if (obj.status == 503 && req.restarts < 4) {

under vcl_error subroutine. It will rerequest the document however even 
with that behavior is still happening though much less :-(. I have 
attached some graphs of 500 responses from varnish and corresponding 
apache responses. Units are hits per second.

I even looked at corresponding responses from Apache and Apache would 
claim that the request succeeded yet varnish would throw a 500.


On Tue, 30 Mar 2010, Justin Pasher wrote:

> I'm currently running Varnish r4633 (I can upgrade if absolutely necessary), 
> and we've been receiving very sporadic 503 errors listed in the log files 
> generated by varnishncsa. It's a very small percentage, but nonetheless, when 
> it happens on an important page, it's noticeable.
> Stats from yesterday's log file show 116 "503" errors out of about 4.1 
> million hits. About 80% of the failed requests are POST requests, which it 
> setup in my VCL as a "pass through". If I look in the apache logs (the 
> backend server), I only see one 503 error returned by apache itself, so maybe 
> there's a timeout issue somewhere. I'm trying to figure out the best way to 
> troubleshoot this, since it's too inconsistent to just sit watching the 
> output of varnishlog.
> Perhaps it's hitting the default value for between_bytes_timeout (60 
> seconds)? If processing the data on the backend takes too long, then varnish 
> would time out after 60 seconds of no data, even if the backend is still 
> churning, right? I guess the question is what situations cause Varnish to 
> return a 503 aside from when the backend itself returns a 503.
> I can post details of my VCL if needed, but it's pretty simple (mostly taken 
> from examples on the site).
