Varnish child killed

Geoff Simmons geoff at
Thu Apr 21 13:42:03 CEST 2011

Hash: SHA256

On 4/21/11 10:51 AM, Jean-Francois Laurens wrote:
> We’re run varnish 2.1.5 for some week now and we still do not understand
> some behavior regarding the shared memory activity.

There's not enough information here for anything better than guesses
about what's going on.

> We specified a –sfile,/var/lib/varnish/varnish_storage.bin,50G in the
> configuration but it’s impossible to go higher than 25G used by varnish.
> In addition I can see varnish doesn’t seem to be able to handle more
> than 1 million objects:

It's not uncommon for Varnish to use significantly less memory than what
was allocated, but not because Varnish can't "handle" it, but just
because it works out that way. Due to a combination of factors like
usage patterns, TTLs, your command line settings and your VCL, Varnish
may decide that it doesn't need more than that.

What do your cache hit ratios say? Do the logs or varnishstat give any
indication that objects are not being cached when you think they should
be? Do you have objects that, semantically, could be cached, but aren't
because, for example, they are unnecessarily setting cookies? You might
be able to get more into the cache more by tweaking VCL, but as I said,
that's just a guess.

> When the child process get killed, the load of the system was very high:
> Apr 20 21:46:44 server-01-39 varnishd[21087]: Child (5372) not
> responding to CLI, killing it.
> ....
> Apr 20 21:49:57 server-01-39 nrpe[18101]: Command completed with return
> code 2 and output: CRITICAL -*load average: 159.00, 159.32,
> 77.02*|load1=159.000;15.000;30.000;0; load5=159.320;10.000;25.000;0;
> load15=77.020;5.000;20.000;0;
> ....
> Apr 20 21:48:43 server-01-39 varnishd[21087]: Child (5372) not
> responding to CLI, killing it.

It looks like the message about high load came after the Varnish
processes died, and that might have happened, at least in part, because
Varnish was restarted and was getting nothing but cache misses. Unless
the high load was caused by something else. Which processes were showing
the highest CPU usage?

The real question is why the Varnish child was no longer responding to
pings. Do you have any panic messages from Varnish in your syslog, or
anything else indicating the error? If the load was that high *before*
the processes died, your system might have been under so much stress
that the child processes just couldn't answer pings in time. In which
case your real problem might be something other than Varnish.

> All this makes me believe we have an issue with some kernel parameters
> that do not allow varnish to handle as many objects as we configured it.

It could be that, it could be another process that was causing heavy
load, it could be your VCL or your command line settings. Too many open
questions here.

- -- 
UPLEX Systemoptimierung
Schwanenwik 24
22087 Hamburg
Mob: +49-176-63690917
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)
Comment: Using GnuPG with Mozilla -


More information about the varnish-misc mailing list