High LRU nuked with plenty of cache space available

Mon Jul 28 18:55:39 CEST 2014

OK, after our service was hitting varnish for about 30 minutes, I see
a total of 2 "could not get storage" errors in the logs, and disk
await and util are both low (well, await is around 15ms, but
utilization is about 5-7%).

We can try switching to malloc storage. I'll update here when we have
a chance to do that.
MAGNE+IC
Dan Crosta | Director, Engineering

On Mon, Jul 28, 2014 at 11:22 AM, MAGNIEN, Thierry
<thierry.magnien at sfr.com> wrote:
> You're right, file storage is a mmap, but once varnish decides to write to disk, this can result in a very large performance penalty, depending on your disks.
>
> With 12G, I would basically suggest to switch to RAM storage, unless you don't have enough.
>
> Regards,
> Thierry
>
> -----Message d'origine-----
> De : varnish-misc-bounces+thierry.magnien=sfr.com at varnish-cache.org [mailto:varnish-misc-bounces+thierry.magnien=sfr.com at varnish-cache.org] De la part de Dan Crosta
> Envoyé : lundi 28 juillet 2014 17:18
> À : varnish-misc at varnish-cache.org
> Objet : Re: High LRU nuked with plenty of cache space available
>
> On Mon, Jul 28, 2014 at 9:35 AM, MAGNIEN, Thierry
> <thierry.magnien at sfr.com> wrote:
>> Hi Dan,
>>
>> Things you should check first are :
>> - varnishlog for "could not get storage" errors
>
> OK -- I'm grepping varnishlog, will let you know if I see anything there.
>
>
>> - disk activity/performance: maybe varnish cannot get storage because your disks are responding too slowly
>
> I thought the "file" storage class was mmap'd storage, and so writes
> should basically hit memory and be periodically background flushed to
> disk, is that not true? I'll keep an eye on await in iostat and let
> you know how it looks. Is there a particular threshold I should be
> watching out for? Is the threshold tunable?
>
>
> Thanks,
> - Dan
> MAGNE+IC
> Dan Crosta | Director, Engineering
>
>
> On Mon, Jul 28, 2014 at 9:35 AM, MAGNIEN, Thierry
> <thierry.magnien at sfr.com> wrote:
>> Hi Dan,
>>
>> Things you should check first are :
>> - varnishlog for "could not get storage" errors
>> - disk activity/performance: maybe varnish cannot get storage because your disks are responding too slowly
>>
>> Regards,
>> Thierry
>>
>> -----Message d'origine-----
>> De : varnish-misc-bounces+thierry.magnien=sfr.com at varnish-cache.org [mailto:varnish-misc-bounces+thierry.magnien=sfr.com at varnish-cache.org] De la part de Dan Crosta
>> Envoyé : samedi 26 juillet 2014 14:58
>> À : varnish-misc at varnish-cache.org
>> Objet : High LRU nuked with plenty of cache space available
>>
>> We're running varnish (3.0.4) like:
>>
>> /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -f
>> /etc/varnish/default.vcl -T 127.0.0.1:8080 -t 2628000 -w 50,1000,120
>> -u varnish -g varnish -S /etc/varnish/secret -s
>> file,/srv/varnish/varnish_storage.bin,12G -p thread_pools=4 -p
>> http_gzip_support=off -p saintmode_threshold=0 -p
>> http_range_support=off
>>
>> varnishstat shows the following for SMF:
>>
>> SMF.s0.c_req            45213612       317.34 Allocator requests
>> SMF.s0.c_fail           14781854       103.75 Allocator failures
>> SMF.s0.c_bytes      2056164147200  14431753.97 Bytes allocated
>> SMF.s0.c_freed      2052654858240  14407123.06 Bytes freed
>> SMF.s0.g_alloc            856760          .   Allocations outstanding
>> SMF.s0.g_bytes        3509288960          .   Bytes outstanding
>> SMF.s0.g_space        9375612928          .   Bytes available
>> SMF.s0.g_smf              950080          .   N struct smf
>> SMF.s0.g_smf_frag          93319          .   N small free smf
>> SMF.s0.g_smf_large             1          .   N large free smf
>>
>> which I interpret to mean that ~9G of the 12G are available ("bytes
>> available") for use by the cache.
>>
>> However, when our applications service the API through varnish at a
>> rate of about 150 QPS, we see that n_lru_nuked increases at about the
>> same rate, the result of which is that the n_object counter stays more
>> or less constant. The cache has only been running for a few days, so I
>> don't believe TTL is to blame here, since we use -t with about 30 days
>> (assuming the value there is in seconds, which the docs seem to
>> imply).
>>
>> I have tried setting a granularity on the storage, which did not seem
>> to have any impact. I'm looking for other suggestions or things to
>> try, as intuition and the stats seem to suggest we should be able to
>> store a lot more objects before things start to get nuked.
>>
>> Thanks,
>> - Dan
>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc