"Child dies" on strace to PID

Frank Gruellich frank.gruellich at mapsolute.com
Mon Jul 19 18:20:47 CEST 2010


Hi,

I noticed a constant load of 1 on both of our two varnish servers and
was going to look into this.  They are running CentOS release 5.4,
x86_64 and varnish-2.1.2-1 (recompiled from the spec file in the
released tar.gz) with the attached vcl and the following command line:

 # /usr/sbin/varnishd -P /var/run/varnish.pid -a :6081 -f /etc/varnish/frontend_maptpserver.maptp42.vcl -T 127.0.0.1:6082 -t 120 -w 1,1000,120 -u varnish -g varnish -s file,/var/lib/varnish/varnish_storage.bin,196G

Then I got distracted when I strace'd to the varnish child and it
immediately exited with:

root at paris [~] $ strace -p 15023
Process 15023 attached - interrupt to quit
close(7)                                = 0
close(8)                                = 0
write(1, "Child dies", 10)              = 10
write(1, "\n", 1)                       = 1
brk(0)                                  = 0x19bfe000
brk(0x19c1f000)                         = 0x19c1f000
exit_group(1)                           = ?
Process 15023 detached
root at paris [~] $

D'oh, all cache gone!  It does that every times, so it seems unrelated to
the load I see on the system.  In /var/log/messages I found:

Jul 19 14:34:09 paris varnishd[5102]: child (15023) Started
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Child starts
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said managed to mmap 210453397504 bytes of 210453397504
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Host: apoll.map24.local.
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris last message repeated 2 times
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Host: gotthard.map24.local.
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris last message repeated 2 times
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Host: hybris.map24.local.
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris last message repeated 2 times
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Host: cure.map24.local.
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris last message repeated 2 times
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Host: hardy.map24.local.
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Host: goblin.map24.local.
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Host: leto.map24.local.
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Host: krantor.map24.local.
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Host: solons.map24.local.
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Host: white.map24.local.
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:39 paris varnishd[5102]: Child (15023) said Child dies
Jul 19 14:34:39 paris varnishd[5102]: Child (15023) died

I think 14:34:10 is all from the startup and at 14:34:39 I'm starting
strace.  That looks like the health check to the backend servers, but
other then that I have no idea what it means.  I run varnishlog to a
file and strace'd the child process again.  The file is 3MB, so I have
uploaded it FTP: ftp://ftp.lbsp.navteq.com/outgoing/2XyKTXBY/.  I cannot
read much from it, but I think the point where it restarts is somewhere
around:

[...]
  122 ReqEnd       c 887294864 1279552540.103346109 1279552540.504165888 0.000036001 0.000036955 0.400782824
  160 Debug        c "herding"
    0 CLI          - EOF on CLI connection, worker stops
    0 WorkThread   - 0x45559e60 start
    0 CLI          - Rd vcl.load "boot" ./vcl.1P9zoqAU.so
    0 Backend_health - apoll Still sick ------H 1 3 5 0.000000 0.000000
    0 Backend_health - apoll Still sick ------H 2 3 5 0.000000 0.000000
[...]

I don't know what "EOF on CLI" means.  Does CLI refer to the management
interface via the -T parameter?  That was definitely not in use at this
time.  Any ideas?  Should I be able to strace the PID?  (I think so.) If
not how can I debug the load issue I have?

(Oh, BTW: the load is gone from the system after the child died.  No, I
don't like that way of solving this problem.)

Thanks in advance.

Kind regards,
-- 
Navteq (DE) GmbH
Frank Gruellich
Map24 Systems and Networks

Duesseldorfer Strasse 40a
65760 Eschborn
Germany

Phone:      +49 6196 77756-414
Fax:        +49 6196 77756-100

HRB 46215, Local Court Frankfurt am Main
Managing Directors: Thomas Golob, Hans Pieter Gieszen, Martin Robert Stockman
USt-ID-No.: DE 197947163
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: frontend_maptpserver.maptp42.vcl
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20100719/90c48680/attachment-0003.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20100719/90c48680/attachment-0003.pgp>


More information about the varnish-misc mailing list