varnish child failing to be restarted

Jeremy Hinegardner jeremy at hinegardner.org
Wed Jan 6 03:05:47 CET 2010


Hi,

We have an issue with varnish 2.0.6 (and it appeared in 2.0.4 also)
where there is an error starting the child process.  In this case the
child is dead and the parent does not auto restart it.

In our system, the child process dies a dozen or more times a day and
the parent restarts the child.  On occasion, the child does not restart
correctly.

Here is a snippet of the log:

    Jan  3 08:27:05 fs6 varnishd[8548]: child (6114) Started
    Jan  3 08:27:05 fs6 varnishd[8548]: Child (6114) said Closed fds: 4 5 6 10 11 13 14
    Jan  3 08:27:05 fs6 varnishd[8548]: Child (6114) said Child starts
    Jan  3 08:27:05 fs6 varnishd[8548]: Child (6114) said managed to mmap 4294967296 bytes of 4294967296
    Jan  3 08:27:05 fs6 varnishd[8548]: Child (6114) said Ready
    Jan  3 16:51:32 fs6 varnishd[8548]: Child (6114) not responding to ping, killing it.
    Jan  3 16:51:35 fs6 varnishd[8548]: Child (6114) died signal=3
    Jan  3 16:51:35 fs6 varnishd[8548]: child (5494) Started
    Jan  3 16:51:35 fs6 varnishd[8548]: Child (5494) said Closed fds: 4 5 6 10 11 13 14
    Jan  3 16:51:35 fs6 varnishd[8548]: Child (5494) said Child starts
    Jan  3 16:51:35 fs6 varnishd[8548]: Child (5494) said managed to mmap 4294967296 bytes of 4294967296
    Jan  3 16:51:35 fs6 varnishd[8548]: Child (5494) said Ready
    Jan  3 18:50:44 fs6 varnishd[8548]: Child (5494) not responding to ping, killing it.
    Jan  3 18:50:59 fs6 varnishd[8548]: Child (5494) not responding to ping, killing it.
    Jan  3 18:50:59 fs6 varnishd[8548]: Child (5494) died signal=3
    Jan  3 18:51:07 fs6 varnishd[8548]: child (14404) Started
    Jan  3 18:51:10 fs6 varnishd[8548]: Pushing vcls failed: CLI communication error
    Jan  3 18:51:10 fs6 varnishd[8548]: Child (14404) said Closed fds: 4 5 6 10 11 13 14
    Jan  3 18:51:10 fs6 varnishd[8548]: Child (14404) said Child starts
    Jan  3 18:51:10 fs6 varnishd[8548]: Child (14404) said managed to mmap 4294967296 bytes of 4294967296
    Jan  3 18:51:10 fs6 varnishd[8548]: Child (14404) said Ready
    Jan  3 18:51:10 fs6 varnishd[8548]: Child (14404) ended
    Jan  3 18:55:26 fs6 varnishd[8548]: Manager got SIGINT 

The SIGINT here is when our nagios page went off and I logged in and did
a restart on varnish.

I'm wondering if there are two issues here.  The first being, why does
our child process die many times during the day, and the 2nd being, why
does the restarting of the child fail sometimes.

In our case, the failure to restart the child is always when the
"Pushing vcls failed" error appears in the log. 

I'll be happy to provide whatever other information may be required to
help figure this out.

enjoy,

-jeremy

-- 
========================================================================
 Jeremy Hinegardner                              jeremy at hinegardner.org 




More information about the varnish-misc mailing list