Does a return(restart) prevent first_byte_timeout from terminating a connection?

Mon Jan 13 19:24:42 CET 2014

Bump. Any idea what might be happening here? Otherwise can someone suggest
to me how I can enforce a *real* timeout on a per-request basis?

I'd like to keep the return(restart) part of my vcl, but not if it means
that some request will take 2x longer than what is specific in
.first_byte_timeout.

On Fri, Jan 10, 2014 at 4:39 PM, Stephen Wood <smwood4 at gmail.com> wrote:

> I am experiencing some strange behavior. Periodically requests will take
> 2x the time specified in first_byte_timeout before they actually are killed
> and the server returns a 503. This is quite annoying and it means that our
> users receive a 504 gateway timeout instead of our error message.
>
> My gut tells me that it has something to do with the way return(restart)
> works. The response code returned from the backend is a 503.
>
> Let me describe my vcl config briefly. The backend definitions look like
> this:
>
> backend test1 {
>   .host = "10.0.0.1";
>   .port = "80";
>   .connect_timeout = 1s;
>   .first_byte_timeout = 65s;
>   .between_bytes_timeout = 1s;
>   .saintmode_threshold = 5;
>   .probe = {
>     .request =
>       "GET /status HTTP/1.1"
>       "Host: test.com"
>       "Connection: close";
>     .interval = 60s;
>     .timeout = 10s;
>     .window = 5;
>     .threshold = 3;
>   }
> }
>
> (similar for backend test2)
>
> I'm using the round-robin directory. Under vcl_fetch I have this:
>
> sub vcl_fetch {
>   if (beresp.status == 503) {
>     set beresp.saintmode = 60s;
>     if (req.request != "POST") {
>       return(restart);
>     } else {
>       error 500 "Failed";
>     }
>   }
>
>   set beresp.grace = 10s;
> }
>
> Lastly, here's me catching the varnish server in the act of essentially
> making two requests and not timing out.
>
>   169 BackendXID   b 1547011740
>   169 TxRequest    b GET
>   169 TxURL        b test?
>   169 TxProtocol   b HTTP/1.1
>   169 TxHeader     b User-Agent: curl/7.30.0
>   169 TxHeader     b Host: xxx.xxx.xxx
>   169 TxHeader     b Accept: */*
>   169 TxHeader     b X-Varnish: 1547011740
>   169 TxHeader     b Accept-Encoding: gzip
>   169 BackendClose b test1
>
>   107 BackendXID   b 1547011740
>   107 TxRequest    b GET
>   107 TxURL        b /test?
>   107 TxProtocol   b HTTP/1.1
>   107 TxHeader     b User-Agent: curl/7.30.0
>   107 TxHeader     b Host: xxx.xxx.xxx
>   107 TxHeader     b Accept: */*
>   107 TxHeader     b X-Varnish: 1547011740
>   107 TxHeader     b Accept-Encoding: gzip
>   107 BackendClose b test2
>
>
>    47 SessionOpen  c 162.208.42.130 59288 :80
>    47 ReqStart     c 162.208.42.130 59288 1547011740
>    47 RxRequest    c GET
>    47 RxURL        c /test?
>    47 RxProtocol   c HTTP/1.1
>    47 RxHeader     c User-Agent: curl/7.30.0
>     47 RxHeader     c Accept: */*
>    47 RxHeader     c Connection: keep-alive
>    47 VCL_call     c recv lookup
>    47 VCL_call     c hash
>    47 Hash         c
>    47 Hash         c /test?
>    47 Hash         c
>    47 VCL_return   c hash
>    47 VCL_call     c miss fetch
>    47 Backend      c 169 main_cluster test1
>    47 FetchError   c http first read error: -1 11 (Resource temporarily
> unavailable)
>    47 Backend      c 107 main_cluster test2
>    47 FetchError   c http first read error: -1 11 (Resource temporarily
> unavailable)
>    47 VCL_call     c error deliver
>    47 VCL_call     c deliver deliver
>    47 TxProtocol   c HTTP/1.1
>    47 TxStatus     c 503
>    47 TxResponse   c Service Unavailable
>    47 TxHeader     c Server: Varnish
>    47 TxHeader     c Content-Type: application/json; charset=utf-8
>    47 TxHeader     c Content-Length: 65
>    47 TxHeader     c Accept-Ranges: bytes
>    47 TxHeader     c Date: Fri, 10 Jan 2014 23:24:52 GMT
>    47 TxHeader     c X-Varnish: 1547011740
>    47 TxHeader     c Age: 130
>    47 TxHeader     c Via: 1.1 varnish
>    47 TxHeader     c Connection: close
>    47 Length       c 65
>    47 ReqEnd       c 1547011740 1389396162.194909096 1389396292.196509361
> 0.000040293 130.000771999 0.000828266
>
> What it looks to me like what's happening is the first server is returning
> a bad status code after 60 seconds (the first_byte_timeout is 65), which
> satisfies the requirement of first_byte_timeout. The request triggers a
> return(restart) and it's tried again on another host, which takes another
> 60 or so seconds. Finally the request just dies by only after it's been
> hanging around for twice the time allotted in first_byte_timeout
>
> Does anyone know what I'm doing wrong? I do want to trigger a
> return(restart) for spurious status codes, but I want first_byte_timeout to
> serve as the time I'm willing to let a request go through. Is there a
> better parameter I can use than this one?
>
> It looks very similar to this bug<https://www.varnish-cache.org/trac/ticket/1156> but
> it looks like that's been fixed. I'm using varnish 3.0.5.
>
> Any help is greatly appreciated.
> --
> Stephen Wood
> www.heystephenwood.com
>

-- 
Stephen Wood
www.heystephenwood.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20140113/6c1639bc/attachment-0001.html>