Does a return(restart) prevent first_byte_timeout from terminating a connection?
Stephen Wood
smwood4 at gmail.com
Mon Jan 13 19:24:42 CET 2014
Bump. Any idea what might be happening here? Otherwise can someone suggest
to me how I can enforce a *real* timeout on a per-request basis?
I'd like to keep the return(restart) part of my vcl, but not if it means
that some request will take 2x longer than what is specific in
.first_byte_timeout.
On Fri, Jan 10, 2014 at 4:39 PM, Stephen Wood <smwood4 at gmail.com> wrote:
> I am experiencing some strange behavior. Periodically requests will take
> 2x the time specified in first_byte_timeout before they actually are killed
> and the server returns a 503. This is quite annoying and it means that our
> users receive a 504 gateway timeout instead of our error message.
>
> My gut tells me that it has something to do with the way return(restart)
> works. The response code returned from the backend is a 503.
>
> Let me describe my vcl config briefly. The backend definitions look like
> this:
>
> backend test1 {
> .host = "10.0.0.1";
> .port = "80";
> .connect_timeout = 1s;
> .first_byte_timeout = 65s;
> .between_bytes_timeout = 1s;
> .saintmode_threshold = 5;
> .probe = {
> .request =
> "GET /status HTTP/1.1"
> "Host: test.com"
> "Connection: close";
> .interval = 60s;
> .timeout = 10s;
> .window = 5;
> .threshold = 3;
> }
> }
>
> (similar for backend test2)
>
> I'm using the round-robin directory. Under vcl_fetch I have this:
>
> sub vcl_fetch {
> if (beresp.status == 503) {
> set beresp.saintmode = 60s;
> if (req.request != "POST") {
> return(restart);
> } else {
> error 500 "Failed";
> }
> }
>
> set beresp.grace = 10s;
> }
>
> Lastly, here's me catching the varnish server in the act of essentially
> making two requests and not timing out.
>
> 169 BackendXID b 1547011740
> 169 TxRequest b GET
> 169 TxURL b test?
> 169 TxProtocol b HTTP/1.1
> 169 TxHeader b User-Agent: curl/7.30.0
> 169 TxHeader b Host: xxx.xxx.xxx
> 169 TxHeader b Accept: */*
> 169 TxHeader b X-Varnish: 1547011740
> 169 TxHeader b Accept-Encoding: gzip
> 169 BackendClose b test1
>
> 107 BackendXID b 1547011740
> 107 TxRequest b GET
> 107 TxURL b /test?
> 107 TxProtocol b HTTP/1.1
> 107 TxHeader b User-Agent: curl/7.30.0
> 107 TxHeader b Host: xxx.xxx.xxx
> 107 TxHeader b Accept: */*
> 107 TxHeader b X-Varnish: 1547011740
> 107 TxHeader b Accept-Encoding: gzip
> 107 BackendClose b test2
>
>
> 47 SessionOpen c 162.208.42.130 59288 :80
> 47 ReqStart c 162.208.42.130 59288 1547011740
> 47 RxRequest c GET
> 47 RxURL c /test?
> 47 RxProtocol c HTTP/1.1
> 47 RxHeader c User-Agent: curl/7.30.0
> 47 RxHeader c Accept: */*
> 47 RxHeader c Connection: keep-alive
> 47 VCL_call c recv lookup
> 47 VCL_call c hash
> 47 Hash c
> 47 Hash c /test?
> 47 Hash c
> 47 VCL_return c hash
> 47 VCL_call c miss fetch
> 47 Backend c 169 main_cluster test1
> 47 FetchError c http first read error: -1 11 (Resource temporarily
> unavailable)
> 47 Backend c 107 main_cluster test2
> 47 FetchError c http first read error: -1 11 (Resource temporarily
> unavailable)
> 47 VCL_call c error deliver
> 47 VCL_call c deliver deliver
> 47 TxProtocol c HTTP/1.1
> 47 TxStatus c 503
> 47 TxResponse c Service Unavailable
> 47 TxHeader c Server: Varnish
> 47 TxHeader c Content-Type: application/json; charset=utf-8
> 47 TxHeader c Content-Length: 65
> 47 TxHeader c Accept-Ranges: bytes
> 47 TxHeader c Date: Fri, 10 Jan 2014 23:24:52 GMT
> 47 TxHeader c X-Varnish: 1547011740
> 47 TxHeader c Age: 130
> 47 TxHeader c Via: 1.1 varnish
> 47 TxHeader c Connection: close
> 47 Length c 65
> 47 ReqEnd c 1547011740 1389396162.194909096 1389396292.196509361
> 0.000040293 130.000771999 0.000828266
>
> What it looks to me like what's happening is the first server is returning
> a bad status code after 60 seconds (the first_byte_timeout is 65), which
> satisfies the requirement of first_byte_timeout. The request triggers a
> return(restart) and it's tried again on another host, which takes another
> 60 or so seconds. Finally the request just dies by only after it's been
> hanging around for twice the time allotted in first_byte_timeout
>
> Does anyone know what I'm doing wrong? I do want to trigger a
> return(restart) for spurious status codes, but I want first_byte_timeout to
> serve as the time I'm willing to let a request go through. Is there a
> better parameter I can use than this one?
>
> It looks very similar to this bug<https://www.varnish-cache.org/trac/ticket/1156> but
> it looks like that's been fixed. I'm using varnish 3.0.5.
>
> Any help is greatly appreciated.
> --
> Stephen Wood
> www.heystephenwood.com
>
--
Stephen Wood
www.heystephenwood.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20140113/6c1639bc/attachment-0001.html>
More information about the varnish-misc
mailing list