Varnish nighmare after upgrading : epilogue ?

Fri Dec 22 20:33:34 UTC 2017

On 23/11/2017 21:57, Raphael Mazelier wrote:
> 
> A short following of the situation and what seems to mitigate the 
> problem/make this platform works.
> After lot of testing and A/B testing, the solution for us was to make 
> more smaller instances.
> We basically double all the servers(vms) , but in the other hand divide 
> by two (or more) the ram, and the memory allocated to varnish.
> We also revert using malloc with little ram (4G) , 12g on the vm(s). We 
> also make scheduled task to flush the cache (restarting varnish).
> This is a completely counter intuitive, because nuking some entries 
> seems better to handle a big cache with no nuke.
> In my understating it means that our hot content remains in the cache, 
> and nuking object is OK. This may also means that our ttl in objects are 
> completly wrong.
> 
> Anyway it seems working. Thanks a lot for the people who help us. (and 
> I'm sure we can find a way to re-tribute this).
> 

Another follow up for posterity :)

I think we have finally succeed restoring a nominal service on your 
application. The main problem in the varnish side was the use of a two 
stage caching pattern for non cachable requests. We completely 
misunderstood the hit for pass concept ; resulting in many request being 
kept in the waiting list at the two stage, specifically in peak. Since 
theses requests can not be cached it seems that piping them in level 1 
is more than enough.  To be fair we also fix some little things in our 
application code too :)

Happy Holidays.

--
Raphael Mazelier