strange temporary varnish outage

Hu Bert revirii at googlemail.com
Tue Feb 19 08:01:46 UTC 2019


Good morning,
i think we solved the problem: we ran into a systemd limit (4915 tasks):

https://github.com/varnishcache/varnish-cache/issues/2822
https://github.com/varnishcache/pkg-varnish-cache/blob/6c90eb775857573564dc1fe38424267143bb6b34/systemd/varnish.service#L19

It seems we hit that limit; i updated the (loooong outdated) v5 to v6
LTS and set TasksMax=infinity. systemctl status varnish.service now
shows: Tasks: 7136 - so, yeah, solved :-) Thx for reading ;-)

Hubert

Am Mo., 18. Feb. 2019 um 10:58 Uhr schrieb Hu Bert <revirii at googlemail.com>:
>
> Hello,
>
> we're using varnish v5 (debian stretch) for image caching; yesterday
> there was a strange outage where i'm somehow unable to find the reason
> as there are almost no log entries, besides one:
>
> Feb 17 09:03:47 rowlf kernel: [1047133.190149] cgroup: fork rejected
> by pids controller in /system.slice/varnish.service
>
> But the problems started a couple of minutes before that, so this
> message simply could be a result of previous problems. Some munin
> graphs:
>
> Backend traffic: strange spike in backend connection retry/success,
> decrease in recycle/reuse:
> https://abload.de/img/varnish_backend_traffqwj74.png
>
> Expunge: a similar spike in "Number of expired objects"
> https://abload.de/img/varnish_expunge-day5kk0l.png
>
> Threads: threads went up at that time; was lower before (restart was
> done on Feb 14th), and suddenly went up.
> day: https://abload.de/img/varnish_threads-dayzoken.png
> week: https://abload.de/img/varnish_threads-week7qjoo.png
> Backend graph: https://abload.de/img/nginx_status-day54jkd.png
>
> /etc/systemd/system/varnish.service : https://pastebin.com/aAhMHn4p
> Here's the (shortened) vcl file: https://pastebin.com/nVu5vVaa
>
> Anyone has an idea how to dig into this? Something horribly wrong in
> the vcl file?
>
>
> Thx,
> Hubert


More information about the varnish-misc mailing list