Hit ratio dropped significantly after recent upgrades

Justin Lloyd justinl at arena.net
Wed Dec 14 14:58:08 CET 2016


So after increasing the TTL for thumbnail images to 4 hours, the hit ratio got to 65-70%, objects in memory got up to around 150k before tapering off (due to lots of expirations starting after 4 hours, as to be expected) and slowly dipping back down to around 100k before starting on the upswing again. I'm continuing to test increasing the TTL, setting it to 24h and we'll see if we have any problems reported as a result in case that's too long, but at any rate we definitely appear to have found the smoking gun and I know where to tinker to try to better optimize things.

I'm not sure why this changed with the upgrades, whether it was something in MediaWiki or Varnish, but at least I know where to spend cycles on optimizations.

Thank you all very much for the help!

Justin
 

-----Original Message-----
From: varnish-misc-bounces+justinl=arena.net at varnish-cache.org [mailto:varnish-misc-bounces+justinl=arena.net at varnish-cache.org] On Behalf Of Justin Lloyd
Sent: Tuesday, December 13, 2016 12:19 PM
To: Florian Tham <fgtham at gmail.com>
Cc: varnish-misc at varnish-cache.org
Subject: RE: Hit ratio dropped significantly after recent upgrades

Based on this conversation, I added a 1h TTL to thumbnail images in vcl_backend_response and that has gotten my hit ratio up to about 55-60% depending on how you calculate it (hit/miss values vs. frontend/backend connections), with up to about 72k objects in memory, up from about 60k max before, though before the upgrades it was more like 600-700k objects.

It's been an hour now and I'm seeing a spike in expired objects and a drop in the number of objects, so I'll probably increase the TTL until I find a sweet spot. I don't think there's any risk since thumbnails don't change often, so even a max of 48h may be reasonable.  So I'll do more testing today and see how things go.

Thanks!


-----Original Message-----
From: Florian Tham [mailto:fgtham at gmail.com]
Sent: Tuesday, December 13, 2016 12:13 PM
To: Justin Lloyd <justinl at arena.net>
Cc: varnish-misc at varnish-cache.org
Subject: RE: Hit ratio dropped significantly after recent upgrades

The log shows that the fetched object is introduced into the cache with both TTL and grace time set to 120s each:

    --  VCL_call       BACKEND_RESPONSE
    --  TTL            VCL 120 120 0 1481637557
    --  VCL_return     deliver
    --  Storage        malloc s0

It would be interesting to see if a subsequent request to the same URL within less than 4 minutes would yield another miss or not.

Regards,

Florian


Am 13. Dezember 2016 15:27:16 schrieb Justin Lloyd <justinl at arena.net>:

> Here’s a typical varnishlog miss for a thumbnail image, appropriately 
> sanitized. I can provide more if it helps
>
> https://gist.github.com/Calygos/ca7906da005569046a7031d1fcaa6372
>
>
> From: Guillaume Quintard [mailto:guillaume at varnish-software.com]
> Sent: Tuesday, December 13, 2016 12:17 AM
> To: Justin Lloyd <justinl at arena.net>
> Cc: Dridi Boukelmoune <dridi at varni.sh>; varnish-misc at varnish-cache.org
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
> Can you pastebin the req+bereq transactions in varnishlog, related to 
> such a miss?
>
> --
> Guillaume Quintard
>
> On Tue, Dec 13, 2016 at 3:37 AM, Justin Lloyd 
> <justinl at arena.net<mailto:justinl at arena.net>> wrote:
> To follow up on my last email from Friday, at this point the problem 
> boils down to one thing that I've not been able to determine: Why are 
> far fewer things being cached now than before the upgrade?
>
> 1. Cookies don't seem to be the problem. Most appear to be Google 
> Analytics (as opposed to session), which are being unset by vcl_recv.
>
> 2. varnishlog/varnishtop shows many thumbnail URLs being missed and 
> virtually none are requested with a no-cache cache-control header. Is 
> it possible to use these tools determine if they (or any URLs for that
> matter) are being cached following a miss-deliver sequence? There are 
> about 1.5m thumbnail files totaling around 30 GB, which prior to the 
> upgrades wasn't an issue, and I don't think it is now since there are 
> only a few expires and purges per minute and no nukes at all. Varnish 
> is only using about 2 GB out of the 8 GB allocated to it, where it 
> used to use all 8 GB and have lots of nukes and far fewer expires, so it's not a memory constraint.
>
> Could there be some other resource limitation I'm hitting without 
> knowing it (nothing in any logs I've seen)? Everything else I could 
> think of so far seems fine, e.g. open files, threads, tcp connections.
>
>
> -----Original Message-----
> From: 
> varnish-misc-bounces+justinl=arena.net at varnish-cache.org<mailto:arena.
> varnish-misc-bounces+net at varnish-cache.org>
> [mailto:varnish-misc-bounces+justinl<mailto:varnish-misc-bounces%2Bjus
> tinl>=arena.net at varnish-cache.org<mailto:arena.net at varnish-cache.org>]
> On Behalf Of Justin Lloyd
> Sent: Friday, December 9, 2016 11:19 AM
> To: Dridi Boukelmoune <dridi at varni.sh<mailto:dridi at varni.sh>>
> Cc: 
> varnish-misc at varnish-cache.org<mailto:varnish-misc at varnish-cache.org>
> Subject: RE: Hit ratio dropped significantly after recent upgrades
>
> I really am looking at what's happening as well. I have been looking 
> at both varnishlog and varnishtop and I see a lot of thumbnail image 
> requests being sent to the backend when there is still plenty of room 
> for them in the cache, so even though there are a lot of thumbnail 
> images, I shouldn't see so many backend requests for them. As I 
> previously mentioned, I give Varnish 8 GB and it used to stay full 
> (based on RSS usage and looking at nukes vs. expires) but now it 
> hovers around only about 2 GB used. A related statistics is that there 
> used to be 600-700k objects in Varnish (based on our graphs of 
> MAIN.n_object via Collectd's varnish-default-struct.objects-object
> metric) but now there are only roughly 40-70k objects in Varnish at 
> any given time. So it's definitely caching a lot fewer things than it 
> was before the upgrade, and most of the requested URLs for requests 
> that have cookies are for a lot of images and thumbnails. Images 
> shouldn't be cached due to size and overall volume but thumbnails 
> should, which is why I strip cookies from the thumbnails. These 
> varnishtop commands break out /images and /images/thumb client 
> requests, showing IMHO too many regular images being cached and 
> nowhere near enough
> thumbnails:
>
> # varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/" and not ReqURL ~ 
> "/images/thumb"'
>
>    349.47 VCL_call       HASH
>    349.47 VCL_call       RECV
>    349.47 VCL_call       DELIVER
>    207.22 VCL_call       HIT
>    116.40 VCL_call       MISS
>    116.30 VCL_call       PASS
>
> # varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/thumb"'
>
>   1859.60 VCL_call       HASH
>   1859.60 VCL_call       RECV
>   1859.60 VCL_call       DELIVER
>   1424.83 VCL_call       MISS
>    422.84 VCL_call       HIT
>    218.82 VCL_call       PASS
>
> I'm still poking around trying to correlate caching of other types of 
> URLs based on whether or not the requests have cookies, if 
> Cache-Control gets returned, etc. but I just wanted to reply with this 
> info. I do appreciate the responses I'm getting! :)
>
>
> -----Original Message-----
> From: Dridi Boukelmoune [mailto:dridi at varni.sh<mailto:dridi at varni.sh>]
> Sent: Friday, December 9, 2016 10:11 AM
> To: Justin Lloyd <justinl at arena.net<mailto:justinl at arena.net>>
> Cc: Dag Haavi Finstad
> <daghf at varnish-software.com<mailto:daghf at varnish-software.com>>;
> varnish-misc at varnish-cache.org<mailto:varnish-misc at varnish-cache.org>
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
>> To reiterate on a point in another of my responses in this thread, I 
>> think it may be something about MediaWiki thumbnail images not being 
>> cached properly despite our current VCL in that regard not having 
>> changed from how it worked prior to the upgrade during which time we 
>> were seeing a very high
>> (86%-ish) hit ratio from the same formula.
>
> To reiterate on a point I made on a couple occasions, it's time to 
> give varnishlog a spin. Too much focus on VCL, and not enough on what's happening.
>
> Dridi
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org<mailto:varnish-misc at varnish-cache.org>
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org<mailto:varnish-misc at varnish-cache.org>
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
>
>
> ----------
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


_______________________________________________
varnish-misc mailing list
varnish-misc at varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


More information about the varnish-misc mailing list