Child process recurrently being restarted

Mon Jun 26 20:10:37 CEST 2017

Looking good!

-- 
Guillaume Quintard

On Mon, Jun 26, 2017 at 7:06 PM, Stefano Baldo <stefanobaldo at gmail.com>
wrote:

> Hi Guillaume,
>
> Can the following be considered "ban lurker friendly"?
>
> sub vcl_backend_response {
>   set beresp.http.x-url = bereq.http.host + bereq.url;
>   set beresp.http.x-user-agent = bereq.http.user-agent;
> }
>
> sub vcl_recv {
>   if (req.method == "PURGE") {
>     ban("obj.http.x-url == " + req.http.host + req.url + " &&
> obj.http.x-user-agent !~ Googlebot");
>     return(synth(750));
>   }
> }
>
> sub vcl_deliver {
>   unset resp.http.x-url;
>   unset resp.http.x-user-agent;
> }
>
> Best,
> Stefano
>
>
> On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Not lurker friendly at all indeed. You'll need to avoid req.* expression.
>> Easiest way is to stash the host, user-agent and url in beresp.http.* and
>> ban against those (unset them in vcl_deliver).
>>
>> I don't think you need to expand the VSL at all.
>>
>> --
>> Guillaume Quintard
>>
>> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>>
>> Hi Guillaume.
>>
>> Thanks for answering.
>>
>> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
>> performance but it stills restarting.
>> Also, I checked the I/O performance for the disk and there is no signal
>> of overhead.
>>
>> I've changed the /var/lib/varnish to a tmpfs and increased its 80m
>> default size passing "-l 200m,20m" to varnishd and using
>> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a
>> problem here. After a couple of hours varnish died and I received a "no
>> space left on device" message - deleting the /var/lib/varnish solved the
>> problem and varnish was up again, but it's weird because there was free
>> memory on the host to be used with the tmpfs directory, so I don't know
>> what could have happened. I will try to stop increasing the
>> /var/lib/varnish size.
>>
>> Anyway, I am worried about the bans. You asked me if the bans are lurker
>> friedly. Well, I don't think so. My bans are created this way:
>>
>> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url + "
>> && req.http.User-Agent !~ Googlebot");
>>
>> Are they lurker friendly? I was taking a quick look and the documentation
>> and it looks like they're not.
>>
>> Best,
>> Stefano
>>
>>
>> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
>> guillaume at varnish-software.com> wrote:
>>
>>> Hi Stefano,
>>>
>>> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
>>> trying to push/pull data and can't make time to reply to the CLI. I'd
>>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>>
>>> After some time, the file storage is terrible on a hard drive (SSDs take
>>> a bit more time to degrade) because of fragmentation. One solution to help
>>> the disks cope is to overprovision themif they're SSDs, and you can try
>>> different advices in the file storage definition in the command line (last
>>> parameter, after granularity).
>>>
>>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>>
>>> 40K bans is a lot, are they ban-lurker friendly?
>>>
>>> --
>>> Guillaume Quintard
>>>
>>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <stefanobaldo at gmail.com>
>>> wrote:
>>>
>>>> Hello.
>>>>
>>>> I am having a critical problem with Varnish Cache in production for
>>>> over a month and any help will be appreciated.
>>>> The problem is that Varnish child process is recurrently being
>>>> restarted after 10~20h of use, with the following message:
>>>>
>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>>> responding to CLI, killed it.
>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from
>>>> ping: 400 CLI communication error
>>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died
>>>> signal=9
>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said Child
>>>> starts
>>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said SMF.s0
>>>> mmap'ed 483183820800 bytes of 483183820800
>>>>
>>>> The following link is the varnishstat output just 1 minute before a
>>>> restart:
>>>>
>>>> https://pastebin.com/g0g5RVTs
>>>>
>>>> Environment:
>>>>
>>>> varnish-5.1.2 revision 6ece695
>>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>>> Installed using pre-built package from official repo at packagecloud.io
>>>> CPU 2x2.9 GHz
>>>> Mem 3.69 GiB
>>>> Running inside a Docker container
>>>> NFILES=131072
>>>> MEMLOCK=82000
>>>>
>>>> Additional info:
>>>>
>>>> - I need to cache a large number of objets and the cache should last
>>>> for almost a week, so I have set up a 450G storage space, I don't know if
>>>> this is a problem;
>>>> - I use ban a lot. There was about 40k bans in the system just before
>>>> the last crash. I really don't know if this is too much or may have
>>>> anything to do with it;
>>>> - No registered CPU spikes (almost always by 30%);
>>>> - No panic is reported, the only info I can retrieve is from syslog;
>>>> - During all the time, event moments before the crashes, everything is
>>>> okay and requests are being responded very fast.
>>>>
>>>> Best,
>>>> Stefano Baldo
>>>>
>>>>
>>>> _______________________________________________
>>>> varnish-misc mailing list
>>>> varnish-misc at varnish-cache.org
>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20170626/13dd8350/attachment.html>