"Child dies" on strace to PID
Frank Gruellich
frank.gruellich at mapsolute.com
Mon Jul 19 18:20:47 CEST 2010
Hi,
I noticed a constant load of 1 on both of our two varnish servers and
was going to look into this. They are running CentOS release 5.4,
x86_64 and varnish-2.1.2-1 (recompiled from the spec file in the
released tar.gz) with the attached vcl and the following command line:
# /usr/sbin/varnishd -P /var/run/varnish.pid -a :6081 -f /etc/varnish/frontend_maptpserver.maptp42.vcl -T 127.0.0.1:6082 -t 120 -w 1,1000,120 -u varnish -g varnish -s file,/var/lib/varnish/varnish_storage.bin,196G
Then I got distracted when I strace'd to the varnish child and it
immediately exited with:
root at paris [~] $ strace -p 15023
Process 15023 attached - interrupt to quit
close(7) = 0
close(8) = 0
write(1, "Child dies", 10) = 10
write(1, "\n", 1) = 1
brk(0) = 0x19bfe000
brk(0x19c1f000) = 0x19c1f000
exit_group(1) = ?
Process 15023 detached
root at paris [~] $
D'oh, all cache gone! It does that every times, so it seems unrelated to
the load I see on the system. In /var/log/messages I found:
Jul 19 14:34:09 paris varnishd[5102]: child (15023) Started
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Child starts
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said managed to mmap 210453397504 bytes of 210453397504
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Host: apoll.map24.local.
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris last message repeated 2 times
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Host: gotthard.map24.local.
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris last message repeated 2 times
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Host: hybris.map24.local.
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris last message repeated 2 times
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Host: cure.map24.local.
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris last message repeated 2 times
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said Host: hardy.map24.local.
Jul 19 14:34:09 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Host: goblin.map24.local.
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Host: leto.map24.local.
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Host: krantor.map24.local.
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Host: solons.map24.local.
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Probe("GET /exodus.html HTTP/1.1
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Host: white.map24.local.
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said Connection: close
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said
Jul 19 14:34:10 paris last message repeated 2 times
Jul 19 14:34:10 paris varnishd[5102]: Child (15023) said ", 0.03, 1)
Jul 19 14:34:39 paris varnishd[5102]: Child (15023) said Child dies
Jul 19 14:34:39 paris varnishd[5102]: Child (15023) died
I think 14:34:10 is all from the startup and at 14:34:39 I'm starting
strace. That looks like the health check to the backend servers, but
other then that I have no idea what it means. I run varnishlog to a
file and strace'd the child process again. The file is 3MB, so I have
uploaded it FTP: ftp://ftp.lbsp.navteq.com/outgoing/2XyKTXBY/. I cannot
read much from it, but I think the point where it restarts is somewhere
around:
[...]
122 ReqEnd c 887294864 1279552540.103346109 1279552540.504165888 0.000036001 0.000036955 0.400782824
160 Debug c "herding"
0 CLI - EOF on CLI connection, worker stops
0 WorkThread - 0x45559e60 start
0 CLI - Rd vcl.load "boot" ./vcl.1P9zoqAU.so
0 Backend_health - apoll Still sick ------H 1 3 5 0.000000 0.000000
0 Backend_health - apoll Still sick ------H 2 3 5 0.000000 0.000000
[...]
I don't know what "EOF on CLI" means. Does CLI refer to the management
interface via the -T parameter? That was definitely not in use at this
time. Any ideas? Should I be able to strace the PID? (I think so.) If
not how can I debug the load issue I have?
(Oh, BTW: the load is gone from the system after the child died. No, I
don't like that way of solving this problem.)
Thanks in advance.
Kind regards,
--
Navteq (DE) GmbH
Frank Gruellich
Map24 Systems and Networks
Duesseldorfer Strasse 40a
65760 Eschborn
Germany
Phone: +49 6196 77756-414
Fax: +49 6196 77756-100
HRB 46215, Local Court Frankfurt am Main
Managing Directors: Thomas Golob, Hans Pieter Gieszen, Martin Robert Stockman
USt-ID-No.: DE 197947163
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: frontend_maptpserver.maptp42.vcl
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20100719/90c48680/attachment-0003.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20100719/90c48680/attachment-0003.pgp>
More information about the varnish-misc
mailing list