Child process recurrently being restarted

Mon Jun 26 17:43:54 CEST 2017

Not lurker friendly at all indeed. You'll need to avoid req.* expression.
Easiest way is to stash the host, user-agent and url in beresp.http.* and
ban against those (unset them in vcl_deliver).

I don't think you need to expand the VSL at all.

-- 
Guillaume Quintard

On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:

Hi Guillaume.

Thanks for answering.

I'm using a SSD disk. I've changed from ext4 to ext2 to increase
performance but it stills restarting.
Also, I checked the I/O performance for the disk and there is no signal of
overhead.

I've changed the /var/lib/varnish to a tmpfs and increased its 80m default
size passing "-l 200m,20m" to varnishd and using
"nodev,nosuid,noatime,size=256M
0 0" for the tmpfs mount. There was a problem here. After a couple of hours
varnish died and I received a "no space left on device" message - deleting
the /var/lib/varnish solved the problem and varnish was up again, but it's
weird because there was free memory on the host to be used with the tmpfs
directory, so I don't know what could have happened. I will try to stop
increasing the /var/lib/varnish size.

Anyway, I am worried about the bans. You asked me if the bans are lurker
friedly. Well, I don't think so. My bans are created this way:

ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url + " &&
req.http.User-Agent !~ Googlebot");

Are they lurker friendly? I was taking a quick look and the documentation
and it looks like they're not.

Best,
Stefano

On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Hi Stefano,
>
> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
> trying to push/pull data and can't make time to reply to the CLI. I'd
> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>
> After some time, the file storage is terrible on a hard drive (SSDs take a
> bit more time to degrade) because of fragmentation. One solution to help
> the disks cope is to overprovision themif they're SSDs, and you can try
> different advices in the file storage definition in the command line (last
> parameter, after granularity).
>
> Is your /var/lib/varnish mount on tmpfs? That could help too.
>
> 40K bans is a lot, are they ban-lurker friendly?
>
> --
> Guillaume Quintard
>
> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <stefanobaldo at gmail.com>
> wrote:
>
>> Hello.
>>
>> I am having a critical problem with Varnish Cache in production for over
>> a month and any help will be appreciated.
>> The problem is that Varnish child process is recurrently being restarted
>> after 10~20h of use, with the following message:
>>
>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>> responding to CLI, killed it.
>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from ping:
>> 400 CLI communication error
>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died signal=9
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said Child
>> starts
>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said SMF.s0
>> mmap'ed 483183820800 bytes of 483183820800
>>
>> The following link is the varnishstat output just 1 minute before a
>> restart:
>>
>> https://pastebin.com/g0g5RVTs
>>
>> Environment:
>>
>> varnish-5.1.2 revision 6ece695
>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>> Installed using pre-built package from official repo at packagecloud.io
>> CPU 2x2.9 GHz
>> Mem 3.69 GiB
>> Running inside a Docker container
>> NFILES=131072
>> MEMLOCK=82000
>>
>> Additional info:
>>
>> - I need to cache a large number of objets and the cache should last for
>> almost a week, so I have set up a 450G storage space, I don't know if this
>> is a problem;
>> - I use ban a lot. There was about 40k bans in the system just before the
>> last crash. I really don't know if this is too much or may have anything to
>> do with it;
>> - No registered CPU spikes (almost always by 30%);
>> - No panic is reported, the only info I can retrieve is from syslog;
>> - During all the time, event moments before the crashes, everything is
>> okay and requests are being responded very fast.
>>
>> Best,
>> Stefano Baldo
>>
>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20170626/e302cda4/attachment.html>