Varnish nighmare after upgrading : need help

Guillaume Quintard guillaume at varnish-software.com
Tue Nov 14 22:41:53 UTC 2017


Hi,

Let's look at the usual suspects first, can we get the output of "ps aux
|grep varnish" and a pastebin of "varnishncsa -1"?

Are you using any vmod?

man varnishncsa will help craft a format line with the response time (on
mobile now, I don't have access to it)

Cheers,

-- 
Guillaume Quintard

On Nov 14, 2017 23:25, "Raphael Mazelier" <raph at futomaki.net> wrote:

> Hello list,
>
> First of all despite my mail subject I really appreciate varnish.
> We use it a lot at work (hundred of instances) with success and
> unfortunately some pain these time.
>
> TLDR; upgrading from varnish 2 to varnish 4 and 5 on one of our
> infrastructure brought us some serious trouble and instability on this
> platform.
> And we are a bit desperate/frustrated
>
>
> Long story.
>
> A bit of context :
>
> This a very complex platform serving an IPTV service with some traffic.
> (8k req/s in peak, even more when it work well).
> It is compose of a two stage reverse proxy cache (3 x 2 varnish for stage
> 1), 2 varnish for stage 2, (so 8 in total) and a lot of different backends
> (php applications, nodejs apps, remote backends *sigh*, and even pipe one).
> This a big historical spaghetti app. We plan to rebuild it from scratch in
> 2018.
> The first stage varnish are separate in two pool handling different
> topology of clients.
>
> A lot of the logic is in varnish/vcl itself, lot of url rewrite, lot of
> manipulation of headers, choice of a backend, and even ESI processing...
> The VCL of the stage 1 varnish are almost 3000 lines long.
>
> But for now we have to leave/deal with it.
>
> History of the problem :
>
> At the beginning all varnish are in 2.x version. Things works almost well.
> This summer we need to upgrade the varnish version to handle very long
> header (a product requirement).
> So after a short battle porting our vcl to vcl4.0 we start using varnish 4.
> Shortly after thing begun to goes very bad.
>
> The first issue we hit, is a memory exhaustion on both stage, and
> oom-killer...
> We test a lot of things, and in the battle we upgrade to varnish5.
> We fix it, resizing the pool, and using now file backend (from memory
> before).
> Memory is now stable (we have large pool, 32G, and strange thing, we never
> have object being nuke, which it good or bad it depend).
> We have also fix a lot of things in our vcl.
>
> The problem we fight against now is only on the stage1 varnish, and
> specifically on one pool (the busiest one).
> When everything goes well the average cpu usage is 30%, memory stabilize
> around 12G, hit cache is around 0.85.
> Problem happen randomly (not everyday) but during our peaks. The cpu
> increase fasly to reach 350% (4 core) and load > 3/
> When the problem is here varnish still deliver requests (we didn't see
> dropped or reject connections) but our application begin to lost user,
> including a big lot of business. I suspect this is because timeout are very
> aggressive on the client side and varnish should answer slowly
>
> -first question : how see response time of request of the varnish server
> ?. (varnishnsca something ?)
>
> I also suspect some kind of request queuing, also stracing varnish when it
> happen show a lot of futex wait ?!.
> The frustrating part is restarting varnish fix the problem immediately,
> and the cpu remains normal after, even if the trafic peak is not finish.
> So there is clearly something stacked in varnish which cause our problem.
>
> -second question : how to see number of stacked connections, long
> connections and so on ?
>
> At this stage we accept all kind of help / hints for debuging (and
> regarding the business impact we can evaluate the help of a professional
> support)
>
> PS : I always have the option to scale out, popping a lot of new varnish
> instance, but this seems very frustrating...
>
> Best,
>
> --
> Raphael Mazelier
>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20171114/b864861c/attachment.html>


More information about the varnish-misc mailing list