From guillaume at varnish-software.com  Fri Feb  1 18:56:11 2019
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Fri, 1 Feb 2019 10:56:11 -0800
Subject: Locating connection reset issue / h2 vs http/1.1
In-Reply-To: <678C443B-1889-40EA-9835-9A2C7EA3091A@shee.org>
References: <E017B753-462D-456E-97F0-054E9840B4F4@shee.org>
 <CAJ6ZYQyWnbSfY91Yxs+cdb4yHtv4UO5jN7Y3ffU+T05TtFqKbg@mail.gmail.com>
 <678C443B-1889-40EA-9835-9A2C7EA3091A@shee.org>
Message-ID: <CAJ6ZYQz-=LOZoTCvzX02MaH6imrGn5KWCZw==7up5tdvntnegA@mail.gmail.com>

Are you able to find some logs in varnishlog (-g session, filtering by
port) to see what varnish is doing?

-- 
Guillaume Quintard


On Thu, Jan 31, 2019 at 9:45 AM <info+varnish at shee.org> wrote:

> Am 31.01.2019 um 17:38 schrieb Guillaume Quintard <
> guillaume at varnish-software.com>:
> >
> > On Thu, Jan 31, 2019 at 6:22 AM <info+varnish at shee.org> wrote:
> >>
> >> I have following stack: hitch-1.5 - varnish-5.2.0 - httpd-2.2/2.4
> >>
> >> On a high traffic node I am observing a lot of "Socket error:
> Connection reset by peer" log entries coming from hitch.
> >>
> >> I am trying to locate the cause of the issue (hitch or varnish site).
> >>
> >> So far I can say; that disabling h2 on hitch the "Connection resets"
> doesn't appear anymore.
> >>
> >> Does this have to do with varnish-5.2.'s h2 implementation?
> >>
> >> Jan 30 19:02:37 srv-s01 hitch[4006]: ww.xx.yy.zz:59395 :0 10:11
> NPN/ALPN protocol: h2
> >> Jan 30 19:02:37 srv-s01 hitch[4006]: ww.xx.yy.zz:59395 :0 10:11 ssl end
> handshake
> >> Jan 30 19:02:37 srv-s01 hitch[4006]: ww.xx.yy.zz:59395 :42884 10:11
> backend connected
> >> Jan 30 19:02:39 srv-s01 hitch[4006]: {backend} Socket error: Connection
> reset by peer
> >> Jan 30 19:02:39 srv-s01 hitch[4006]: ww.xx.yy.zz:59395 :42884 10:11
> proxy shutdown req=SHUTDOWN_CLEAR
> >> Jan 30 19:02:39 srv-s01 hitch[4006]: {backend} Socket error: Broken pipe
> >> Jan 30 19:02:39 srv-s01 hitch[4006]: ww.xx.yy.zz:59395 :42884 10:11
> proxy shutdown req=SHUTDOWN_CLEAR
> >> Jan 30 19:02:39 srv-s01 hitch[4006]: ww.xx.yy.zz:59399 :0 10:11 proxy
> connect
> >
> > Have you activated h2 support in Varnish? (it's not on by default)
> >
>
> Sure, DAEMON_OPTS has -p feature=+http2 passed. The content is delivered
> via h2 (verified in browsers)
> but sometimes lot of assets (client view) produce ERR_CONNECTION_CLOSED
> errors in the browser and on
> server site the mentioned "connection reset by peer" log entries appears
> ...
>
> --
> Leon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20190201/03138b1c/attachment-0001.html>

From revirii at googlemail.com  Tue Feb  5 10:50:54 2019
From: revirii at googlemail.com (Hu Bert)
Date: Tue, 5 Feb 2019 11:50:54 +0100
Subject: varnish 5.0: varnish slow when backends do not respond?
Message-ID: <CAAV-98_0yc_WoT2ip0S5=KP8Y6UPuFuVgEoorCZ9Je=OLYoevg@mail.gmail.com>

Hey there,

i hope i'm right here... i have the following setup to deliver images:

nginx: https -> forward request to varnish 5.0
if image is not in cache -> forward request to backend nginx
backend nginx: delivers file to varnish if found on harddisk
if backend nginx doesn't find: forward request to 2 backend tomcats to
calculate the desired image

The 2 backend tomcats do deliver another webapp (and are a varnish
backend as well); at the moment they're quite busy and stop working
due to heavy load (->restart), the result is that varnish sees/thinks
that the backends are sick. Somehow then even the cached images are
delivered after a quite long waiting period, e.g. a 5 KB image takes
more than 7 seconds.

Is this the normal behaviour that varnish does answer slowly if some
backends are sick?

If any other information is need i can provide the necessary stuff.

Thx in advance
Hubert

From guillaume at varnish-software.com  Tue Feb  5 11:32:49 2019
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Tue, 5 Feb 2019 12:32:49 +0100
Subject: varnish 5.0: varnish slow when backends do not respond?
In-Reply-To: <CAAV-98_0yc_WoT2ip0S5=KP8Y6UPuFuVgEoorCZ9Je=OLYoevg@mail.gmail.com>
References: <CAAV-98_0yc_WoT2ip0S5=KP8Y6UPuFuVgEoorCZ9Je=OLYoevg@mail.gmail.com>
Message-ID: <CAJ6ZYQzDAevpBgJ3T5sK7DE2bwHwd2i=VpRTqcKKOdxXb21aog@mail.gmail.com>

Hi,

Do you have probes set up? If you do, the backend will be declared sick and
varnish will reply instantly without even trying to contact it.

It sounds like that at the moment, varnish just tries to get whatever it
can, waiting for as long as authorized.

Cheers,

On Tue, Feb 5, 2019, 11:51 Hu Bert <revirii at googlemail.com wrote:

> Hey there,
>
> i hope i'm right here... i have the following setup to deliver images:
>
> nginx: https -> forward request to varnish 5.0
> if image is not in cache -> forward request to backend nginx
> backend nginx: delivers file to varnish if found on harddisk
> if backend nginx doesn't find: forward request to 2 backend tomcats to
> calculate the desired image
>
> The 2 backend tomcats do deliver another webapp (and are a varnish
> backend as well); at the moment they're quite busy and stop working
> due to heavy load (->restart), the result is that varnish sees/thinks
> that the backends are sick. Somehow then even the cached images are
> delivered after a quite long waiting period, e.g. a 5 KB image takes
> more than 7 seconds.
>
> Is this the normal behaviour that varnish does answer slowly if some
> backends are sick?
>
> If any other information is need i can provide the necessary stuff.
>
> Thx in advance
> Hubert
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20190205/23b5049f/attachment.html>

From revirii at googlemail.com  Tue Feb  5 11:55:31 2019
From: revirii at googlemail.com (Hu Bert)
Date: Tue, 5 Feb 2019 12:55:31 +0100
Subject: varnish 5.0: varnish slow when backends do not respond?
In-Reply-To: <CAJ6ZYQzDAevpBgJ3T5sK7DE2bwHwd2i=VpRTqcKKOdxXb21aog@mail.gmail.com>
References: <CAAV-98_0yc_WoT2ip0S5=KP8Y6UPuFuVgEoorCZ9Je=OLYoevg@mail.gmail.com>
 <CAJ6ZYQzDAevpBgJ3T5sK7DE2bwHwd2i=VpRTqcKKOdxXb21aog@mail.gmail.com>
Message-ID: <CAAV-989cGtFf6pm1dVj1vXNtV46_DNDCgO8ac6whhZP8e2qoKQ@mail.gmail.com>

Hi Guillaume,

the backend config looks like this (just questioning a simple file
from tomcat); maybe params are wrong? :

backend tomcat_backend1 {
   .host = "192.168.0.126";
   .port = "8082";
   .connect_timeout = 15s;
   .first_byte_timeout = 60s;
   .between_bytes_timeout = 15s;
   .probe = {
       .url = "/portal/info.txt";
       .timeout = 10s;
       .interval = 1m;
       .window = 3;
       .threshold = 1;
   }
}

The backend is shown as 'sick', but the time until you get an answer
from nginx/varnish differs, from below a second to 7 or more seconds -
but the requested image is already in cache (hits >= 1).

Imho the cache should work and deliver a cached file, independent from
a (non) working backend. Maybe beresp.ttl messed up?

    else if (beresp.status<300) {
[lots of rules]
    } else {
       # Use very short caching time for error messages - giving the
system the chance to recover
       set beresp.ttl = 10s;
       unset beresp.http.Cache-Control;
       return(deliver);
   }

Thx
Hubert

Am Di., 5. Feb. 2019 um 12:33 Uhr schrieb Guillaume Quintard
<guillaume at varnish-software.com>:
>
> Hi,
>
> Do you have probes set up? If you do, the backend will be declared sick and varnish will reply instantly without even trying to contact it.
>
> It sounds like that at the moment, varnish just tries to get whatever it can, waiting for as long as authorized.
>
> Cheers,
>
> On Tue, Feb 5, 2019, 11:51 Hu Bert <revirii at googlemail.com wrote:
>>
>> Hey there,
>>
>> i hope i'm right here... i have the following setup to deliver images:
>>
>> nginx: https -> forward request to varnish 5.0
>> if image is not in cache -> forward request to backend nginx
>> backend nginx: delivers file to varnish if found on harddisk
>> if backend nginx doesn't find: forward request to 2 backend tomcats to
>> calculate the desired image
>>
>> The 2 backend tomcats do deliver another webapp (and are a varnish
>> backend as well); at the moment they're quite busy and stop working
>> due to heavy load (->restart), the result is that varnish sees/thinks
>> that the backends are sick. Somehow then even the cached images are
>> delivered after a quite long waiting period, e.g. a 5 KB image takes
>> more than 7 seconds.
>>
>> Is this the normal behaviour that varnish does answer slowly if some
>> backends are sick?
>>
>> If any other information is need i can provide the necessary stuff.
>>
>> Thx in advance
>> Hubert
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

From guillaume at varnish-software.com  Tue Feb  5 15:10:09 2019
From: guillaume at varnish-software.com (Guillaume Quintard)
Date: Tue, 5 Feb 2019 16:10:09 +0100
Subject: varnish 5.0: varnish slow when backends do not respond?
In-Reply-To: <CAAV-98-LS0i1aCB7Z1ZXE1qR+DgDh_DB2DV34scP-0Fx1TXn_w@mail.gmail.com>
References: <CAAV-98_0yc_WoT2ip0S5=KP8Y6UPuFuVgEoorCZ9Je=OLYoevg@mail.gmail.com>
 <CAJ6ZYQzDAevpBgJ3T5sK7DE2bwHwd2i=VpRTqcKKOdxXb21aog@mail.gmail.com>
 <CAAV-989cGtFf6pm1dVj1vXNtV46_DNDCgO8ac6whhZP8e2qoKQ@mail.gmail.com>
 <CAJ6ZYQxPM42moDLk5ZJzgA22xqSzm3DE7Ytruf8N=9G6XMvyjg@mail.gmail.com>
 <CAAV-988ZTb=RVJCvtB8vcLkKt5VOdWzGN2WbDoL1YW0AyHoEfQ@mail.gmail.com>
 <CAJ6ZYQxXAd07b726TCKoA_cZmosJdPJ0yoYa==aH0vNKwbgo0w@mail.gmail.com>
 <CAAV-98-LS0i1aCB7Z1ZXE1qR+DgDh_DB2DV34scP-0Fx1TXn_w@mail.gmail.com>
Message-ID: <CAJ6ZYQw4iCHcOzDy_ZCVxTsaNuhYHzCqqOhPChyDho1qgVZZ1w@mail.gmail.com>

re-adding the list

> I will reduce probe param 'interval' to, let's say, 10s. That sounds
reasonable?

I would definitely make for more reactive decision. Shameless plug: I would
recommend reading on that topic:
https://info.varnish-software.com/blog/backends-load-balancing (man vcl,
the probes section is of course a must-read)

-- 
Guillaume Quintard


On Tue, Feb 5, 2019 at 3:05 PM Hu Bert <revirii at googlemail.com> wrote:

> Hi,
> i'll try these commands. No output so far, but i'll see.
>
> I will reduce probe param 'interval' to, let's say, 10s. That sounds
> reasonable?
>
>
> Hubert
>
> Am Di., 5. Feb. 2019 um 14:35 Uhr schrieb Guillaume Quintard
> <guillaume at varnish-software.com>:
> >
> > Try something like that: varnishlog -q "Timestamp:Resp[2] > 7" -g request
> > (man vsl-query for more info)
> >
> > I just think your probe definition is pretty bad (1 minute interval is
> going to yield some wonky results) and you varnish sees the backend as
> healthy, tries to fetch, fakes a long time, then the probe finally kicks in.
> > --
> > Guillaume Quintard
> >
> >
> > On Tue, Feb 5, 2019 at 1:47 PM Hu Bert <revirii at googlemail.com> wrote:
> >>
> >> Hi,
> >> sry i can't reproduce, as i had to get the varnish running. Maybe i
> >> have to explain... :-)
> >>
> >> We once had a server with nginx (frontend), varnish and some other
> >> stuff, and as RAM became a tight resource, we got another server
> >> (server2) running, separately for varnish. That server now cached all
> >> the images and all other stuff (like css, js etc.) from the tomcat
> >> backends. So the vcl file contained the images backends and all the
> >> tomcat backends.
> >>
> >> We then moved the cache for "all the other stuff" to server3, and
> >> server2 only cached images from then on. But the vcl file stayed
> >> untouched, still containing all the backends&probes that actually
> >> weren't necessary for images - and now 2 of these backends (due to
> >> load) repeatedly answered 500/502 and have to be rebooted regularly
> >> (nothing can be done here at the moment).
> >>
> >> To get the varnish on server2 (images) running i simply removed all
> >> the unnecessary tomcat backends and restarted varnish, and now it's
> >> running really good. I still have the old vcl file on server3 running,
> >> there i see that the 2 tomcat backends are changing between sick and
> >> healthy. Don't know if it might work there as well - i tried it but
> >> the output of 'varnishlog -g request' is massive. Something special i
> >> should grep for?
> >>
> >> Alternatively i could provide the vcl file, but i'm afraid that your
> >> eyes might explode ;-)
> >>
> >> Hubert
> >>
> >> Am Di., 5. Feb. 2019 um 13:14 Uhr schrieb Guillaume Quintard
> >> <guillaume at varnish-software.com>:
> >> >
> >> > Hi,
> >> >
> >> > Can you try to set the backend health to sick using "varnishadm
> backend.set_health" and try to reproduce?
> >> >
> >> > If you can reproduce, please pastebin the corresponding "varnishlog
> -g request" block
> >> >
> >> > On Tue, Feb 5, 2019, 12:55 Hu Bert <revirii at googlemail.com wrote:
> >> >>
> >> >> Hi Guillaume,
> >> >>
> >> >> the backend config looks like this (just questioning a simple file
> >> >> from tomcat); maybe params are wrong? :
> >> >>
> >> >> backend tomcat_backend1 {
> >> >>    .host = "192.168.0.126";
> >> >>    .port = "8082";
> >> >>    .connect_timeout = 15s;
> >> >>    .first_byte_timeout = 60s;
> >> >>    .between_bytes_timeout = 15s;
> >> >>    .probe = {
> >> >>        .url = "/portal/info.txt";
> >> >>        .timeout = 10s;
> >> >>        .interval = 1m;
> >> >>        .window = 3;
> >> >>        .threshold = 1;
> >> >>    }
> >> >> }
> >> >>
> >> >> The backend is shown as 'sick', but the time until you get an answer
> >> >> from nginx/varnish differs, from below a second to 7 or more seconds
> -
> >> >> but the requested image is already in cache (hits >= 1).
> >> >>
> >> >> Imho the cache should work and deliver a cached file, independent
> from
> >> >> a (non) working backend. Maybe beresp.ttl messed up?
> >> >>
> >> >>     else if (beresp.status<300) {
> >> >> [lots of rules]
> >> >>     } else {
> >> >>        # Use very short caching time for error messages - giving the
> >> >> system the chance to recover
> >> >>        set beresp.ttl = 10s;
> >> >>        unset beresp.http.Cache-Control;
> >> >>        return(deliver);
> >> >>    }
> >> >>
> >> >> Thx
> >> >> Hubert
> >> >>
> >> >> Am Di., 5. Feb. 2019 um 12:33 Uhr schrieb Guillaume Quintard
> >> >> <guillaume at varnish-software.com>:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > Do you have probes set up? If you do, the backend will be declared
> sick and varnish will reply instantly without even trying to contact it.
> >> >> >
> >> >> > It sounds like that at the moment, varnish just tries to get
> whatever it can, waiting for as long as authorized.
> >> >> >
> >> >> > Cheers,
> >> >> >
> >> >> > On Tue, Feb 5, 2019, 11:51 Hu Bert <revirii at googlemail.com wrote:
> >> >> >>
> >> >> >> Hey there,
> >> >> >>
> >> >> >> i hope i'm right here... i have the following setup to deliver
> images:
> >> >> >>
> >> >> >> nginx: https -> forward request to varnish 5.0
> >> >> >> if image is not in cache -> forward request to backend nginx
> >> >> >> backend nginx: delivers file to varnish if found on harddisk
> >> >> >> if backend nginx doesn't find: forward request to 2 backend
> tomcats to
> >> >> >> calculate the desired image
> >> >> >>
> >> >> >> The 2 backend tomcats do deliver another webapp (and are a varnish
> >> >> >> backend as well); at the moment they're quite busy and stop
> working
> >> >> >> due to heavy load (->restart), the result is that varnish
> sees/thinks
> >> >> >> that the backends are sick. Somehow then even the cached images
> are
> >> >> >> delivered after a quite long waiting period, e.g. a 5 KB image
> takes
> >> >> >> more than 7 seconds.
> >> >> >>
> >> >> >> Is this the normal behaviour that varnish does answer slowly if
> some
> >> >> >> backends are sick?
> >> >> >>
> >> >> >> If any other information is need i can provide the necessary
> stuff.
> >> >> >>
> >> >> >> Thx in advance
> >> >> >> Hubert
> >> >> >> _______________________________________________
> >> >> >> varnish-misc mailing list
> >> >> >> varnish-misc at varnish-cache.org
> >> >> >> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20190205/5154e31a/attachment-0001.html>

From revirii at googlemail.com  Mon Feb 18 09:58:24 2019
From: revirii at googlemail.com (Hu Bert)
Date: Mon, 18 Feb 2019 10:58:24 +0100
Subject: strange temporary varnish outage
Message-ID: <CAAV-98-JmvRFYqs2RyuNBo00hU9h-uh7Yu_URYXSNM_3DCst9Q@mail.gmail.com>

Hello,

we're using varnish v5 (debian stretch) for image caching; yesterday
there was a strange outage where i'm somehow unable to find the reason
as there are almost no log entries, besides one:

Feb 17 09:03:47 rowlf kernel: [1047133.190149] cgroup: fork rejected
by pids controller in /system.slice/varnish.service

But the problems started a couple of minutes before that, so this
message simply could be a result of previous problems. Some munin
graphs:

Backend traffic: strange spike in backend connection retry/success,
decrease in recycle/reuse:
https://abload.de/img/varnish_backend_traffqwj74.png

Expunge: a similar spike in "Number of expired objects"
https://abload.de/img/varnish_expunge-day5kk0l.png

Threads: threads went up at that time; was lower before (restart was
done on Feb 14th), and suddenly went up.
day: https://abload.de/img/varnish_threads-dayzoken.png
week: https://abload.de/img/varnish_threads-week7qjoo.png
Backend graph: https://abload.de/img/nginx_status-day54jkd.png

/etc/systemd/system/varnish.service : https://pastebin.com/aAhMHn4p
Here's the (shortened) vcl file: https://pastebin.com/nVu5vVaa

Anyone has an idea how to dig into this? Something horribly wrong in
the vcl file?


Thx,
Hubert

From revirii at googlemail.com  Tue Feb 19 08:01:46 2019
From: revirii at googlemail.com (Hu Bert)
Date: Tue, 19 Feb 2019 09:01:46 +0100
Subject: strange temporary varnish outage
In-Reply-To: <CAAV-98-JmvRFYqs2RyuNBo00hU9h-uh7Yu_URYXSNM_3DCst9Q@mail.gmail.com>
References: <CAAV-98-JmvRFYqs2RyuNBo00hU9h-uh7Yu_URYXSNM_3DCst9Q@mail.gmail.com>
Message-ID: <CAAV-988XkyUg00aTXXwKXa05igCgV_m_bWr5JsDNdZPeup6o5A@mail.gmail.com>

Good morning,
i think we solved the problem: we ran into a systemd limit (4915 tasks):

https://github.com/varnishcache/varnish-cache/issues/2822
https://github.com/varnishcache/pkg-varnish-cache/blob/6c90eb775857573564dc1fe38424267143bb6b34/systemd/varnish.service#L19

It seems we hit that limit; i updated the (loooong outdated) v5 to v6
LTS and set TasksMax=infinity. systemctl status varnish.service now
shows: Tasks: 7136 - so, yeah, solved :-) Thx for reading ;-)

Hubert

Am Mo., 18. Feb. 2019 um 10:58 Uhr schrieb Hu Bert <revirii at googlemail.com>:
>
> Hello,
>
> we're using varnish v5 (debian stretch) for image caching; yesterday
> there was a strange outage where i'm somehow unable to find the reason
> as there are almost no log entries, besides one:
>
> Feb 17 09:03:47 rowlf kernel: [1047133.190149] cgroup: fork rejected
> by pids controller in /system.slice/varnish.service
>
> But the problems started a couple of minutes before that, so this
> message simply could be a result of previous problems. Some munin
> graphs:
>
> Backend traffic: strange spike in backend connection retry/success,
> decrease in recycle/reuse:
> https://abload.de/img/varnish_backend_traffqwj74.png
>
> Expunge: a similar spike in "Number of expired objects"
> https://abload.de/img/varnish_expunge-day5kk0l.png
>
> Threads: threads went up at that time; was lower before (restart was
> done on Feb 14th), and suddenly went up.
> day: https://abload.de/img/varnish_threads-dayzoken.png
> week: https://abload.de/img/varnish_threads-week7qjoo.png
> Backend graph: https://abload.de/img/nginx_status-day54jkd.png
>
> /etc/systemd/system/varnish.service : https://pastebin.com/aAhMHn4p
> Here's the (shortened) vcl file: https://pastebin.com/nVu5vVaa
>
> Anyone has an idea how to dig into this? Something horribly wrong in
> the vcl file?
>
>
> Thx,
> Hubert

From dridi at varni.sh  Sat Feb 23 07:36:56 2019
From: dridi at varni.sh (Dridi Boukelmoune)
Date: Sat, 23 Feb 2019 08:36:56 +0100
Subject: strange temporary varnish outage
In-Reply-To: <CAAV-988XkyUg00aTXXwKXa05igCgV_m_bWr5JsDNdZPeup6o5A@mail.gmail.com>
References: <CAAV-98-JmvRFYqs2RyuNBo00hU9h-uh7Yu_URYXSNM_3DCst9Q@mail.gmail.com>
 <CAAV-988XkyUg00aTXXwKXa05igCgV_m_bWr5JsDNdZPeup6o5A@mail.gmail.com>
Message-ID: <CABoVN9DUFJcML15mCKsBRoEGfECT11UPwgNqAD_OQYd8tbCmWA@mail.gmail.com>

On Tue, Feb 19, 2019 at 9:03 AM Hu Bert <revirii at googlemail.com> wrote:
>
> Good morning,
> i think we solved the problem: we ran into a systemd limit (4915 tasks):
>
> https://github.com/varnishcache/varnish-cache/issues/2822
> https://github.com/varnishcache/pkg-varnish-cache/blob/6c90eb775857573564dc1fe38424267143bb6b34/systemd/varnish.service#L19
>
> It seems we hit that limit; i updated the (loooong outdated) v5 to v6
> LTS and set TasksMax=infinity. systemctl status varnish.service now
> shows: Tasks: 7136 - so, yeah, solved :-) Thx for reading ;-)

Happy to see that moving to 6.0 solved the problem!

> Hubert
>
> Am Mo., 18. Feb. 2019 um 10:58 Uhr schrieb Hu Bert <revirii at googlemail.com>:
> >
> > Hello,
> >
> > we're using varnish v5 (debian stretch) for image caching; yesterday
> > there was a strange outage where i'm somehow unable to find the reason
> > as there are almost no log entries, besides one:
> >
> > Feb 17 09:03:47 rowlf kernel: [1047133.190149] cgroup: fork rejected
> > by pids controller in /system.slice/varnish.service
> >
> > But the problems started a couple of minutes before that, so this
> > message simply could be a result of previous problems. Some munin
> > graphs:
> >
> > Backend traffic: strange spike in backend connection retry/success,
> > decrease in recycle/reuse:
> > https://abload.de/img/varnish_backend_traffqwj74.png
> >
> > Expunge: a similar spike in "Number of expired objects"
> > https://abload.de/img/varnish_expunge-day5kk0l.png
> >
> > Threads: threads went up at that time; was lower before (restart was
> > done on Feb 14th), and suddenly went up.
> > day: https://abload.de/img/varnish_threads-dayzoken.png
> > week: https://abload.de/img/varnish_threads-week7qjoo.png
> > Backend graph: https://abload.de/img/nginx_status-day54jkd.png
> >
> > /etc/systemd/system/varnish.service : https://pastebin.com/aAhMHn4p
> > Here's the (shortened) vcl file: https://pastebin.com/nVu5vVaa
> >
> > Anyone has an idea how to dig into this? Something horribly wrong in
> > the vcl file?
> >
> >
> > Thx,
> > Hubert
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc