BAN - Memory consumption exploding under load
Hugues Alary
hugues at betabrand.com
Tue Jun 19 20:54:46 UTC 2018
In case someone runs into a similar issue regarding BANs accumulating, I
found an experimental `ban_cutoff` runtime parameter (see below) that
specifically exists for this issue.
A search for `ban_cutoff` in this mailing list archive also yields a good
thread about this (thread: "Varnish Lurker is getting slower / Ban lists
keeps increasing")
ban_cutoff
Value is: 0 [bans]
Default is: 0
Minimum is: 0
Expurge long tail content from the cache to keep the number of
bans below this value. 0 disables.
When this parameter is set to a non-zero value, the ban lurker
continues to work the ban list as usual top to bottom, but when
it reaches the ban_cutoff-th ban, it treats all objects as if
they matched a ban and expurges them from cache. As actively
used objects get tested against the ban list at request time
and thus are likely to be associated with bans near the top of
the ban list, with ban_cutoff, least recently accessed objects
(the "long tail") are removed.
This parameter is a safety net to avoid bad response times due
to bans being tested at lookup time. Setting a cutoff trades
response time for cache efficiency. The recommended value is
proportional to rate(bans_lurker_tests_tested) / n_objects
while the ban lurker is working, which is the number of bans
the system can sustain. The additional latency due to request
ban testing is in the order of ban_cutoff /
rate(bans_lurker_tests_tested). For example, for
rate(bans_lurker_tests_tested) = 2M/s and a tolerable latency
of 100ms, a good value for ban_cutoff may be 200K.
NB: We do not know yet if it is a good idea to change this
parameter, or if the default value is even sensible. Caution is
advised, and feedback is most welcome.
On Thu, May 31, 2018 at 3:49 PM Hugues Alary <hugues at betabrand.com> wrote:
> Hi there,
>
> Re-opening this thread from 2016!
>
> A few days ago, I ran into an issue where varnish (varnish-6.0.0 revision
> a068361dff0d25a0d85cf82a6e5fdaf315e06a7d) became completely unresponsive.
>
> While investigating, I noticed a lot of errors in varnishlog `VSL: store
> overflow`, but most importantly, I noticed our BAN list was starting to
> accumulate with somewhere around 130K BANs reported by varnishstat.
>
> The load on the website was pretty low. Restarting Varnish fixed the
> issue. It is possible I had been running some script generating a lot of
> BANs, but it still seems weird it would generate so many BANs that varnish
> would become unresponsive.
>
> After restarting Varnish, I watched `varnishstat -1 | grep bans` and could
> see MAIN.bans going up and down, but staying fairly stable on average.
>
> Today, I decided to check again and I noticed:
>
> MAIN.bans 66863 . Count of bans
> MAIN.bans_completed 66834 . Number of bans marked
> 'completed'
> MAIN.bans_obj 66849 . Number of bans using obj.*
> MAIN.bans_req 14 . Number of bans using req.*
> MAIN.bans_added 170939 1.18 Bans added
> MAIN.bans_deleted 104076 0.72 Bans deleted
> MAIN.bans_tested 3767852 26.08 Bans tested against
> objects (lookup)
> MAIN.bans_obj_killed 62024 0.43 Objects killed by bans
> (lookup)
> MAIN.bans_lurker_tested 3344073359 23148.62 Bans tested against
> objects (lurker)
> MAIN.bans_tests_tested 94711644 655.62 Ban tests tested against
> objects (lookup)
> MAIN.bans_lurker_tests_tested 4573910595 31661.91 Ban tests tested
> against objects (lurker)
> MAIN.bans_lurker_obj_killed 16627 0.12 Objects killed by
> bans (lurker)
> MAIN.bans_lurker_obj_killed_cutoff 0 0.00 Objects
> killed by bans for cutoff (lurker)
> MAIN.bans_dups 113845 0.79 Bans
> superseded by other bans
> MAIN.bans_lurker_contention 3 0.00 Lurker gave
> way for lookup
> MAIN.bans_persisted_bytes 163879330 . Bytes used by
> the persisted ban lists
> MAIN.bans_persisted_fragmentation 162795194 . Extra bytes
> in persisted ban lists due to fragmentation
>
>
> 1- MAIN.bans is almost at 70k and varnish has been running for 1 day 16
> hours
> 2- MAIN.bans_req is > 0, but nowhere in my configuration do I ever submit
> bans using req.*
>
> So I have 3 questions:
>
> - is 70k+ bans un-reasonable and should I find a way to reduce them?
> - why am I seeing 14 MAIN.bans_req when my config doesn't ever emit such
> bans?
>
> Thanks,
> -Hugues
>
> On Thu, Nov 3, 2016 at 3:04 AM, Dridi Boukelmoune <dridi at varni.sh> wrote:
>
>> On Thu, Oct 27, 2016 at 6:02 PM, Nils Goroll <slink at schokola.de> wrote:
>> > Hi,
>> >
>> > we've added a bunch of ban and ban lurker improvements which did not get
>> > backported to 4.1.3
>> >
>> > Please upgrade to 5.0 or master.
>>
>> Hi Nils,
>>
>> I was thinking (against the too-many-knobs trend) that maybe we should
>> have some sort of ban_queue_limit parameter to avoid piling up too
>> many bans. It would also play nice with your std.ban() initiative
>> where hitting the queue limit could be one of the reasons returned by
>> the function when it fails.
>>
>> The default could be 0 for unlimited, or a fairly large number.
>>
>> Dridi
>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20180619/6aa7d90e/attachment-0001.html>
More information about the varnish-misc
mailing list