Child process recurrently being restarted

Stefano Baldo stefanobaldo at gmail.com
Mon Jun 26 19:06:05 CEST 2017


Hi Guillaume,

Can the following be considered "ban lurker friendly"?

sub vcl_backend_response {
  set beresp.http.x-url = bereq.http.host + bereq.url;
  set beresp.http.x-user-agent = bereq.http.user-agent;
}

sub vcl_recv {
  if (req.method == "PURGE") {
    ban("obj.http.x-url == " + req.http.host + req.url + " &&
obj.http.x-user-agent !~ Googlebot");
    return(synth(750));
  }
}

sub vcl_deliver {
  unset resp.http.x-url;
  unset resp.http.x-user-agent;
}

Best,
Stefano


On Mon, Jun 26, 2017 at 12:43 PM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Not lurker friendly at all indeed. You'll need to avoid req.* expression.
> Easiest way is to stash the host, user-agent and url in beresp.http.* and
> ban against those (unset them in vcl_deliver).
>
> I don't think you need to expand the VSL at all.
>
> --
> Guillaume Quintard
>
> On Jun 26, 2017 16:51, "Stefano Baldo" <stefanobaldo at gmail.com> wrote:
>
> Hi Guillaume.
>
> Thanks for answering.
>
> I'm using a SSD disk. I've changed from ext4 to ext2 to increase
> performance but it stills restarting.
> Also, I checked the I/O performance for the disk and there is no signal of
> overhead.
>
> I've changed the /var/lib/varnish to a tmpfs and increased its 80m default
> size passing "-l 200m,20m" to varnishd and using
> "nodev,nosuid,noatime,size=256M 0 0" for the tmpfs mount. There was a
> problem here. After a couple of hours varnish died and I received a "no
> space left on device" message - deleting the /var/lib/varnish solved the
> problem and varnish was up again, but it's weird because there was free
> memory on the host to be used with the tmpfs directory, so I don't know
> what could have happened. I will try to stop increasing the
> /var/lib/varnish size.
>
> Anyway, I am worried about the bans. You asked me if the bans are lurker
> friedly. Well, I don't think so. My bans are created this way:
>
> ban("req.http.host == " + req.http.host + " && req.url ~ " + req.url + "
> && req.http.User-Agent !~ Googlebot");
>
> Are they lurker friendly? I was taking a quick look and the documentation
> and it looks like they're not.
>
> Best,
> Stefano
>
>
> On Fri, Jun 23, 2017 at 11:30 AM, Guillaume Quintard <
> guillaume at varnish-software.com> wrote:
>
>> Hi Stefano,
>>
>> Let's cover the usual suspects: I/Os. I think here Varnish gets stuck
>> trying to push/pull data and can't make time to reply to the CLI. I'd
>> recommend monitoring the disk activity (bandwidth and iops) to confirm.
>>
>> After some time, the file storage is terrible on a hard drive (SSDs take
>> a bit more time to degrade) because of fragmentation. One solution to help
>> the disks cope is to overprovision themif they're SSDs, and you can try
>> different advices in the file storage definition in the command line (last
>> parameter, after granularity).
>>
>> Is your /var/lib/varnish mount on tmpfs? That could help too.
>>
>> 40K bans is a lot, are they ban-lurker friendly?
>>
>> --
>> Guillaume Quintard
>>
>> On Fri, Jun 23, 2017 at 4:01 PM, Stefano Baldo <stefanobaldo at gmail.com>
>> wrote:
>>
>>> Hello.
>>>
>>> I am having a critical problem with Varnish Cache in production for over
>>> a month and any help will be appreciated.
>>> The problem is that Varnish child process is recurrently being restarted
>>> after 10~20h of use, with the following message:
>>>
>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) not
>>> responding to CLI, killed it.
>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Unexpected reply from
>>> ping: 400 CLI communication error
>>> Jun 23 09:15:13 b858e4a8bd72 varnishd[11816]: Child (11824) died signal=9
>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child cleanup complete
>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) Started
>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said Child
>>> starts
>>> Jun 23 09:15:14 b858e4a8bd72 varnishd[11816]: Child (24038) said SMF.s0
>>> mmap'ed 483183820800 bytes of 483183820800
>>>
>>> The following link is the varnishstat output just 1 minute before a
>>> restart:
>>>
>>> https://pastebin.com/g0g5RVTs
>>>
>>> Environment:
>>>
>>> varnish-5.1.2 revision 6ece695
>>> Debian 8.7 - Debian GNU/Linux 8 (3.16.0)
>>> Installed using pre-built package from official repo at packagecloud.io
>>> CPU 2x2.9 GHz
>>> Mem 3.69 GiB
>>> Running inside a Docker container
>>> NFILES=131072
>>> MEMLOCK=82000
>>>
>>> Additional info:
>>>
>>> - I need to cache a large number of objets and the cache should last for
>>> almost a week, so I have set up a 450G storage space, I don't know if this
>>> is a problem;
>>> - I use ban a lot. There was about 40k bans in the system just before
>>> the last crash. I really don't know if this is too much or may have
>>> anything to do with it;
>>> - No registered CPU spikes (almost always by 30%);
>>> - No panic is reported, the only info I can retrieve is from syslog;
>>> - During all the time, event moments before the crashes, everything is
>>> okay and requests are being responded very fast.
>>>
>>> Best,
>>> Stefano Baldo
>>>
>>>
>>> _______________________________________________
>>> varnish-misc mailing list
>>> varnish-misc at varnish-cache.org
>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20170626/b4f7ba62/attachment-0001.html>


More information about the varnish-misc mailing list