Does a return(restart) prevent first_byte_timeout from terminating a connection?

Tue Jan 14 09:13:39 CET 2014

Hi,

I think it's the normal behavior from Varnish point of view. Return(restart) restarts the whole request processing so first_byte_timeout is applied each time. Depending on what you want to achieve, either using half the value of first_byte_timeout or write a small patch for Varnish should do the trick.

Regards,
Thierry

De : varnish-misc-bounces+thierry.magnien=sfr.com at varnish-cache.org [mailto:varnish-misc-bounces+thierry.magnien=sfr.com at varnish-cache.org] De la part de Stephen Wood
Envoyé : lundi 13 janvier 2014 19:25
À : varnish-misc at varnish-cache.org
Objet : Re: Does a return(restart) prevent first_byte_timeout from terminating a connection?

Bump. Any idea what might be happening here? Otherwise can someone suggest to me how I can enforce a real timeout on a per-request basis?

I'd like to keep the return(restart) part of my vcl, but not if it means that some request will take 2x longer than what is specific in .first_byte_timeout.

On Fri, Jan 10, 2014 at 4:39 PM, Stephen Wood <smwood4 at gmail.com<mailto:smwood4 at gmail.com>> wrote:
I am experiencing some strange behavior. Periodically requests will take 2x the time specified in first_byte_timeout before they actually are killed and the server returns a 503. This is quite annoying and it means that our users receive a 504 gateway timeout instead of our error message.

My gut tells me that it has something to do with the way return(restart) works. The response code returned from the backend is a 503.

Let me describe my vcl config briefly. The backend definitions look like this:

backend test1 {
  .host = "10.0.0.1";
  .port = "80";
  .connect_timeout = 1s;
  .first_byte_timeout = 65s;
  .between_bytes_timeout = 1s;
  .saintmode_threshold = 5;
  .probe = {
    .request =
      "GET /status HTTP/1.1"
      "Host: test.com<http://test.com>"
      "Connection: close";
    .interval = 60s;
    .timeout = 10s;
    .window = 5;
    .threshold = 3;
  }
}

(similar for backend test2)

I'm using the round-robin directory. Under vcl_fetch I have this:

sub vcl_fetch {
  if (beresp.status == 503) {
    set beresp.saintmode = 60s;
    if (req.request != "POST") {
      return(restart);
    } else {
      error 500 "Failed";
    }
  }

  set beresp.grace = 10s;
}

Lastly, here's me catching the varnish server in the act of essentially making two requests and not timing out.

  169 BackendXID   b 1547011740
  169 TxRequest    b GET
  169 TxURL        b test?
  169 TxProtocol   b HTTP/1.1
  169 TxHeader     b User-Agent: curl/7.30.0
  169 TxHeader     b Host: xxx.xxx.xxx
  169 TxHeader     b Accept: */*
  169 TxHeader     b X-Varnish: 1547011740
  169 TxHeader     b Accept-Encoding: gzip
  169 BackendClose b test1

  107 BackendXID   b 1547011740
  107 TxRequest    b GET
  107 TxURL        b /test?
  107 TxProtocol   b HTTP/1.1
  107 TxHeader     b User-Agent: curl/7.30.0
  107 TxHeader     b Host: xxx.xxx.xxx
  107 TxHeader     b Accept: */*
  107 TxHeader     b X-Varnish: 1547011740
  107 TxHeader     b Accept-Encoding: gzip
  107 BackendClose b test2

   47 SessionOpen  c 162.208.42.130 59288 :80
   47 ReqStart     c 162.208.42.130 59288 1547011740
   47 RxRequest    c GET
   47 RxURL        c /test?
   47 RxProtocol   c HTTP/1.1
   47 RxHeader     c User-Agent: curl/7.30.0
    47 RxHeader     c Accept: */*
   47 RxHeader     c Connection: keep-alive
   47 VCL_call     c recv lookup
   47 VCL_call     c hash
   47 Hash         c
   47 Hash         c /test?
   47 Hash         c
   47 VCL_return   c hash
   47 VCL_call     c miss fetch
   47 Backend      c 169 main_cluster test1
   47 FetchError   c http first read error: -1 11 (Resource temporarily unavailable)
   47 Backend      c 107 main_cluster test2
   47 FetchError   c http first read error: -1 11 (Resource temporarily unavailable)
   47 VCL_call     c error deliver
   47 VCL_call     c deliver deliver
   47 TxProtocol   c HTTP/1.1
   47 TxStatus     c 503
   47 TxResponse   c Service Unavailable
   47 TxHeader     c Server: Varnish
   47 TxHeader     c Content-Type: application/json; charset=utf-8
   47 TxHeader     c Content-Length: 65
   47 TxHeader     c Accept-Ranges: bytes
   47 TxHeader     c Date: Fri, 10 Jan 2014 23:24:52 GMT
   47 TxHeader     c X-Varnish: 1547011740
   47 TxHeader     c Age: 130
   47 TxHeader     c Via: 1.1 varnish
   47 TxHeader     c Connection: close
   47 Length       c 65
   47 ReqEnd       c 1547011740 1389396162.194909096 1389396292.196509361 0.000040293 130.000771999 0.000828266

What it looks to me like what's happening is the first server is returning a bad status code after 60 seconds (the first_byte_timeout is 65), which satisfies the requirement of first_byte_timeout. The request triggers a return(restart) and it's tried again on another host, which takes another 60 or so seconds. Finally the request just dies by only after it's been hanging around for twice the time allotted in first_byte_timeout

Does anyone know what I'm doing wrong? I do want to trigger a return(restart) for spurious status codes, but I want first_byte_timeout to serve as the time I'm willing to let a request go through. Is there a better parameter I can use than this one?

It looks very similar to this bug<https://www.varnish-cache.org/trac/ticket/1156> but it looks like that's been fixed. I'm using varnish 3.0.5.

Any help is greatly appreciated.
--
Stephen Wood
www.heystephenwood.com<http://www.heystephenwood.com>

--
Stephen Wood
www.heystephenwood.com<http://www.heystephenwood.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20140114/4020916a/attachment.html>