About saintmode_threshold behavior

Mon Jul 12 13:12:38 CEST 2010

Thanks Kristian.
My point is, if at one hour period, some backend returns ten "5XX" errors
and that objects are inserted on the list of bad objects. Varnish consider
its a sick backend, correct ?  But really it isnt a sick backend.
Im asking this questions because on my servers architeture used:

LoadBalancer ---> Varnish's --->  LoadBalancer ---> Backends

So, for instance, one backend called "IMAGES", have just one server (the
loadbalancer). And if is marked down, all requests not in cache will return
error.

Dont you think we could have a "time perspective" related to the thresholds
?
For instance:
    thresholds_items = 10;
    thresholds_time  = 3600 (seconds);

And after 1 hour the entire list of bad objects is cleaned.

On Mon, Jul 12, 2010 at 5:44 AM, Kristian Lyngstol <
kristian at varnish-software.com> wrote:

> On Fri, Jul 09, 2010 at 01:39:10PM -0300, Rodrigo K. Ferreira wrote:
> > About the error counters what is compared with saintmode_threshold, when
> it
> > counter is back to zero ? Just when that backend server are penalized ?
> Or
> > always after one backend probe ?
> > This questions is why is a bit normal dinamic backends servers returns
> few
> > 5XX errors, for client reqs bad formed or other reasons. And if isnt back
> to
> > zero, backend servers will be labeled sick in some time.
>
> Ok, I'm not entirely sure I understand what you're asking, but I'll explain
> saintmode_threshold anyway.
>
> Every time you use the "saintmode" command/directive in VCL, you add an
> entry to a list of bad objects, hooked up to the backend. So one list for
> each backend.
>
> When Varnish is trying to find a healthy backend, it will check if the
> objecthead it's looking for is represented on the list. While checking, it
> will count how many valid entries are present on the list. The only
> condition required for an entry to be valid is that it has not timed out.
> If it either finds the objecthead on the list OR finds saintmode_threshold
> items on the list, the backend is considered sick. This is not affected by
> health check polling at all. The only way to re-enable a backend that is
> considered sick because of too many saintmode-items, is time.
>
> Do keep in mind, though, that new entries are not added to the list after
> saintmode_threshold is reached. You might get a couple extra on the account
> of parallel requests going to the backend, but once the list is large
> enough, the backend wont be used, and thus cant get new items added to the
> blacklist. So if you use a 20s timer on saintmode, the maximum time until
> varnish retries the backend is 20 seconds.
>
> Consider saintmode a combination of a buffer until the real health checks
> detect the problem, and a way to blacklist just one item on one backend.
>
> You will need _different_ items on the saintmode blacklist to mark the
> backend as completely down. Even if a single page returns 500 constantly,
> that will not bring down the entire backend - it will just make varnish not
> ask that backend for that specific page.
>
> Hope this cleared up some questions, though it might add a few new ones I
> suppose.
>
> - Kristian
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20100712/ea87090b/attachment-0003.html>