backend timeouts/503s vs grace cache

Andrei lagged at gmail.com
Wed Nov 15 15:03:10 UTC 2017


I'm afraid I'll have to skip that step since the tunnel is restarted based
on connection errors from probes sent every 3s, and those "pending
requests" shouldn't be completing correctly anyways as those are half open
connections at that point. Not to mention I want to minimize
visibility/downtime, and waiting on the lingering requests also means more
timeouts to worry about. Thanks again for your input though. I should
definitely see better results from request tagging/restarts and marking the
backends accordingly during the tunnel reatarts

On Nov 15, 2017 16:56, "Guillaume Quintard" <guillaume at varnish-software.com>
wrote:

Once you set the backend to sick, you will probably still have requests
in-flight to this backend, so you don't want to restart your SSH tunnel
just yet. Instead, monitor the VBE.*.BACKENDNAME.conn lines, and wait for
them to drop to 0, then you can reset the SSH tunnel.

-- 
Guillaume Quintard

On Wed, Nov 15, 2017 at 3:48 PM, Andrei <lagged at gmail.com> wrote:

> What do you mean exactly when you say "drain the connections"? :D
>
> On Wed, Nov 15, 2017 at 8:46 AM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Oh, then your life should be easier then! Don't forget to drain the
>> connections, varnishstat will give you the number of open connections open
>> to any backend.
>>
>> --
>> Guillaume Quintard
>>
>> On Wed, Nov 15, 2017 at 3:42 PM, Andrei <lagged at gmail.com> wrote:
>>
>>> Thanks for the pointers! The tunnel setup is pretty flexible so I'll go
>>> ahead and mark the backend sick before restarting the tunnel, then healthy
>>> once confirmed up.
>>>
>>>
>>> On Wed, Nov 15, 2017 at 8:34 AM, Guillaume Quintard <
>>> guillaume at varnish-software.com> wrote:
>>>
>>>> You can wait until vcl_deliver and do a restart, possibly adding a
>>>> marker saying "don't bother with the backend, serve from cache".
>>>>
>>>> The actual solution would be to mark the backend as sick before
>>>> restarting the ssh tunnel, and draining the connections, but I guess that's
>>>> not an option here, is it?
>>>>
>>>> --
>>>> Guillaume Quintard
>>>>
>>>> On Wed, Nov 15, 2017 at 3:29 PM, Andrei <lagged at gmail.com> wrote:
>>>>
>>>>> Hi Guillaume,
>>>>>
>>>>> Thanks for getting back to me
>>>>>
>>>>> On Wed, Nov 15, 2017 at 8:11 AM, Guillaume Quintard <
>>>>> guillaume at varnish-software.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Why bother with the complex vcl_hit? Since you are saying that the
>>>>>> cache is regularly primed, I don't really see the added value.
>>>>>>
>>>>> I was mainly going by an example a while back, and not all sites/urls
>>>>> are primed in the same manner. It just stuck in the conf ever since
>>>>>
>>>>>
>>>>>>
>>>>>> (note, after a quick glance at it, I think it could just be a race
>>>>>> condition where the backend appears up in vcl_hit and is down by the time
>>>>>> you ask it the content)
>>>>>>
>>>>> How would you suggest "restarting" the request to try and force a
>>>>> grace cache object to be returned if present in that case?
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> --
>>>>>> Guillaume Quintard
>>>>>>
>>>>>> On Wed, Nov 15, 2017 at 6:02 AM, Andrei <lagged at gmail.com> wrote:
>>>>>>
>>>>>>> bump
>>>>>>>
>>>>>>> On Sun, Nov 5, 2017 at 2:12 AM, Andrei <lagged at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> One of the backends we have configured, runs through an SSH tunnel
>>>>>>>> which occasionally gets restarted. When the tunnel is restarted, Varnish is
>>>>>>>> returning a 503 since it can't reach the backend for pages which would
>>>>>>>> normally be cached (we force cache on the front page of the related site).
>>>>>>>> I believe our grace implementation might be incorrect, as we would expect a
>>>>>>>> grace period cache return instead of 503.
>>>>>>>>
>>>>>>>> Our grace ttl is set to 21600 seconds based on a global variable:
>>>>>>>>
>>>>>>>> sub vcl_backend_response {
>>>>>>>>   set beresp.grace = std.duration(variable.global_get("ttl_grace")
>>>>>>>> + "s", 6h);
>>>>>>>> }
>>>>>>>>
>>>>>>>> Our grace implementation in sub vcl_hit is:
>>>>>>>>
>>>>>>>>   sub vcl_hit {
>>>>>>>>     # We have no fresh fish. Lets look at the stale ones.
>>>>>>>>     if (std.healthy(req.backend_hint)) {
>>>>>>>>       # Backend is healthy. Limit age to 10s.
>>>>>>>>       if (obj.ttl + 10s > 0s) {
>>>>>>>>         #set req.http.grace = "normal(limited)";
>>>>>>>>         std.log("OKHITDELIVER: obj.ttl:" + obj.ttl + " obj.keep: "
>>>>>>>> + obj.keep + " obj.grace: " + obj.grace);
>>>>>>>>         return (deliver);
>>>>>>>>       } else {
>>>>>>>>         # No candidate for grace. Fetch a fresh object.
>>>>>>>>         std.log("No candidate for grace. Fetch a fresh object.
>>>>>>>> obj.ttl:" + obj.ttl + " obj.keep: " + obj.keep + " obj.grace: " +
>>>>>>>> obj.grace);
>>>>>>>>         return(miss);
>>>>>>>>       }
>>>>>>>>     } else {
>>>>>>>>       # backend is sick - use full grace
>>>>>>>>         if (obj.ttl + obj.grace > 0s) {
>>>>>>>>         #set req.http.grace = "full";
>>>>>>>>         std.log("SICK DELIVERY: obj.hits: " +   obj.hits + "
>>>>>>>> obj.ttl:" + obj.ttl + " obj.keep: " + obj.keep + " obj.grace: " +
>>>>>>>> obj.grace);
>>>>>>>>         return (deliver);
>>>>>>>>       } else {
>>>>>>>>         # no graced object.
>>>>>>>>         std.log("No graced object. obj.ttl:" + obj.ttl + "
>>>>>>>> obj.keep: " + obj.keep + " obj.grace: " + obj.grace);
>>>>>>>>         return (miss);
>>>>>>>>       }
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     # fetch & deliver once we get the result
>>>>>>>>     return (miss); # Dead code, keep as a safeguard
>>>>>>>>   }
>>>>>>>>
>>>>>>>>
>>>>>>>> Occasionally we see:
>>>>>>>> -   VCL_Log        No candidate for grace. Fetch a fresh object.
>>>>>>>> obj.ttl:-1369.659 obj.keep: 0.000 obj.grace: 21600.000
>>>>>>>>
>>>>>>>> For the most part, it's:
>>>>>>>> -   VCL_Log        OKHITDELIVER: obj.ttl:26.872 obj.keep: 0.000
>>>>>>>> obj.grace: 21600.000
>>>>>>>>
>>>>>>>> Are we setting the grace ttl too low perhaps?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> varnish-misc mailing list
>>>>>>> varnish-misc at varnish-cache.org
>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20171115/ec056be5/attachment.html>


More information about the varnish-misc mailing list