[Varnish] #1054: Child not responding to CLI, killing it
Varnish
varnish-bugs at varnish-cache.org
Mon May 14 10:50:56 CEST 2012
#1054: Child not responding to CLI, killing it
-----------------------+----------------------------------------------------
Reporter: scorillo | Type: defect
Status: reopened | Priority: normal
Milestone: | Component: varnishd
Version: 3.0.2 | Severity: normal
Resolution: | Keywords:
-----------------------+----------------------------------------------------
Changes (by etherael):
* status: closed => reopened
* resolution: worksforme =>
Comment:
I am seeing the exact same issue on a varnish cache here, dies every 3-4
days. Details follow;
{{{
Messagelog content;
incident 1;
May 9 15:13:57 elk2 varnishd[25859]: Child (2551) not responding to CLI,
killing it.
May 9 15:14:07 elk2 varnishd[25859]: Child (2551) not responding to CLI,
killing it.
May 9 15:14:17 elk2 varnishd[25859]: Child (2551) not responding to CLI,
killing it.
May 9 15:14:27 elk2 varnishd[25859]: Child (2551) not responding to CLI,
killing it.
May 9 15:14:37 elk2 varnishd[25859]: Child (2551) not responding to CLI,
killing it.
May 9 15:14:47 elk2 varnishd[25859]: Child (2551) not responding to CLI,
killing it.
May 9 15:14:52 elk2 varnishd[25859]: Child (2551) not responding to CLI,
killing it.
May 9 15:14:52 elk2 varnishd[25859]: Child (2551) not responding to CLI,
killing it.
May 9 15:14:52 elk2 varnishd[25859]: Child (2551) died signal=3 (core
dumped)
May 9 15:14:52 elk2 varnishd[25859]: child (15127) Started
May 9 15:14:52 elk2 varnishd[25859]: Child (15127) said Child starts
May 9 15:14:52 elk2 varnishd[25859]: Child (15127) said SMF.s0 mmap'ed
311385128960 bytes of 311385128960
incident 2;
May 14 00:53:33 elk2 varnishd[25859]: Child (15127) not responding to CLI,
killing it.
May 14 00:53:43 elk2 varnishd[25859]: Child (15127) not responding to CLI,
killing it.
May 14 00:53:53 elk2 varnishd[25859]: Child (15127) not responding to CLI,
killing it.
May 14 00:54:03 elk2 varnishd[25859]: Child (15127) not responding to CLI,
killing it.
May 14 00:54:13 elk2 varnishd[25859]: Child (15127) not responding to CLI,
killing it.
May 14 00:54:20 elk2 abrt[30696]: file /usr/sbin/varnishd seems to be
deleted
May 14 00:54:21 elk2 varnishd[25859]: Child (15127) not responding to CLI,
killing it.
May 14 00:54:21 elk2 varnishd[25859]: Child (15127) not responding to CLI,
killing it.
May 14 00:54:21 elk2 varnishd[25859]: Child (15127) died signal=3 (core
dumped)
May 14 00:54:21 elk2 varnishd[25859]: child (30697) Started
May 14 00:54:21 elk2 varnishd[25859]: Child (30697) said Child starts
May 14 00:54:21 elk2 varnishd[25859]: Child (30697) said SMF.s0 mmap'ed
311385128960 bytes of 311385128960
version info;
root at parrot:~$ rpm -qa |grep varnish
varnish-libs-3.0.2-1.el5.x86_64
varnish-3.0.2-1.el5.x86_64
varnish-release-3.0-1.noarch
root at parrot:~$ uname -a
Linux parrot 2.6.32-220.7.1.el6.x86_64 #1 SMP Wed Mar 7 00:52:02 GMT 2012
x86_64 x86_64 x86_64 GNU/Linux
root at parrot:~$ cat /etc/redhat-release
CentOS release 6.2 (Final)
startup cmd;
varnish 30697 25859 10 00:54 ? 00:17:52 /usr/sbin/varnishd -P
/var/run/varnish.pid -a :80 -T localhost:6082 -f /etc/varnish/default.vcl
-u varnish
-g varnish -S /etc/varnish/secret -s
file,/var/lib/varnish/varnish_storage.bin,290G
varnishstat;
root at parrot:~$ varnishstat -1
client_conn 253164 25.32 Client connections accepted
client_drop 0 0.00 Connection dropped, no
sess/wrk
client_req 338951 33.91 Client requests received
cache_hit 156382 15.64 Cache hits
cache_hitpass 0 0.00 Cache hits for pass
cache_miss 153794 15.38 Cache misses
backend_conn 69798 6.98 Backend conn. success
backend_unhealthy 0 0.00 Backend conn. not attempted
backend_busy 0 0.00 Backend conn. too many
backend_fail 2 0.00 Backend conn. failures
backend_reuse 112847 11.29 Backend conn. reuses
backend_toolate 1116 0.11 Backend conn. was closed
backend_recycle 113971 11.40 Backend conn. recycles
backend_retry 38 0.00 Backend conn. retry
fetch_head 6 0.00 Fetch head
fetch_length 178831 17.89 Fetch with Length
fetch_chunked 3509 0.35 Fetch chunked
fetch_eof 0 0.00 Fetch EOF
fetch_bad 0 0.00 Fetch had bad headers
fetch_close 204 0.02 Fetch wanted close
fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed
fetch_zero 0 0.00 Fetch zero len
fetch_failed 0 0.00 Fetch failed
fetch_1xx 0 0.00 Fetch no body (1xx)
fetch_204 0 0.00 Fetch no body (204)
fetch_304 1 0.00 Fetch no body (304)
n_sess_mem 565 . N struct sess_mem
n_sess 145 . N struct sess
n_object 153780 . N struct object
n_vampireobject 0 . N unresurrected objects
n_objectcore 153835 . N struct objectcore
n_objecthead 153835 . N struct objecthead
n_waitinglist 1139 . N struct waitinglist
n_vbc 58 . N struct vbc
n_wrk 90 . N worker threads
n_wrk_create 360 0.04 N worker threads created
n_wrk_failed 0 0.00 N worker threads not created
n_wrk_max 0 0.00 N worker threads limited
n_wrk_lqueue 0 0.00 work request queue length
n_wrk_queued 4119 0.41 N queued work requests
n_wrk_drop 0 0.00 N dropped work requests
n_backend 3 . N backends
n_expired 0 . N expired objects
n_lru_nuked 0 . N LRU nuked objects
n_lru_moved 142671 . N LRU moved objects
losthdr 0 0.00 HTTP header overflows
n_objsendfile 0 0.00 Objects sent with sendfile
n_objwrite 329530 32.96 Objects sent with write
n_objoverflow 0 0.00 Objects overflowing workspace
s_sess 253107 25.32 Total Sessions
s_req 338951 33.91 Total Requests
s_pipe 0 0.00 Total pipe
s_pass 28775 2.88 Total pass
s_fetch 182551 18.26 Total fetch
s_hdrbytes 138408349 13844.99 Total header bytes
s_bodybytes 6200495767 620235.65 Total body bytes
sess_closed 106492 10.65 Session Closed
sess_pipeline 289 0.03 Session Pipeline
sess_readahead 72 0.01 Session Read Ahead
sess_linger 239202 23.93 Session Linger
sess_herd 246213 24.63 Session herd
shm_records 23151895 2315.88 SHM records
shm_writes 1718951 171.95 SHM writes
shm_flushes 0 0.00 SHM flushes due to overflow
shm_cont 6496 0.65 SHM MTX contention
shm_cycles 9 0.00 SHM cycles through buffer
sms_nreq 18 0.00 SMS allocator requests
sms_nobj 0 . SMS outstanding allocations
sms_nbytes 0 . SMS outstanding bytes
sms_balloc 7524 . SMS bytes allocated
sms_bfree 7524 . SMS bytes freed
backend_req 182657 18.27 Backend requests made
n_vcl 1 0.00 N vcl total
n_vcl_avail 1 0.00 N vcl available
n_vcl_discard 0 0.00 N vcl discarded
n_ban 1 . N total active bans
n_ban_add 1 0.00 N new bans added
n_ban_retire 0 0.00 N old bans deleted
n_ban_obj_test 0 0.00 N objects tested
n_ban_re_test 0 0.00 N regexps tested against
n_ban_dups 0 0.00 N duplicate bans removed
hcb_nolock 310164 31.03 HCB Lookups without lock
hcb_lock 153842 15.39 HCB Lookups with lock
hcb_insert 153842 15.39 HCB Inserts
esi_errors 0 0.00 ESI parse errors (unlock)
esi_warnings 0 0.00 ESI parse warnings (unlock)
accept_fail 0 0.00 Accept failures
client_drop_late 0 0.00 Connection dropped late
uptime 9997 1.00 Client uptime
dir_dns_lookups 0 0.00 DNS director lookups
dir_dns_failed 0 0.00 DNS director failed lookups
dir_dns_hit 0 0.00 DNS director cached lookups
hit
dir_dns_cache_full 0 0.00 DNS director full dnscache
vmods 0 . Loaded VMODs
n_gzip 0 0.00 Gzip operations
n_gunzip 235860 23.59 Gunzip operations
LCK.sms.creat 2 0.00 Created locks
LCK.sms.destroy 0 0.00 Destroyed locks
LCK.sms.locks 141948 14.20 Lock Operations
LCK.sms.colls 0 0.00 Collisions
LCK.smp.creat 0 0.00 Created locks
LCK.smp.destroy 0 0.00 Destroyed locks
LCK.smp.locks 0 0.00 Lock Operations
LCK.smp.colls 0 0.00 Collisions
LCK.sma.creat 2 0.00 Created locks
LCK.sma.destroy 0 0.00 Destroyed locks
LCK.sma.locks 4334608 433.59 Lock Operations
LCK.sma.colls 0 0.00 Collisions
LCK.smf.creat 2 0.00 Created locks
LCK.smf.destroy 0 0.00 Destroyed locks
LCK.smf.locks 20270014 2027.61 Lock Operations
LCK.smf.colls 0 0.00 Collisions
LCK.hsl.creat 0 0.00 Created locks
LCK.hsl.destroy 0 0.00 Destroyed locks
LCK.hsl.locks 0 0.00 Lock Operations
LCK.hsl.colls 0 0.00 Collisions
LCK.hcb.creat 2 0.00 Created locks
LCK.hcb.destroy 0 0.00 Destroyed locks
LCK.hcb.locks 6813196 681.52 Lock Operations
LCK.hcb.colls 0 0.00 Collisions
LCK.hcl.creat 0 0.00 Created locks
LCK.hcl.destroy 0 0.00 Destroyed locks
LCK.hcl.locks 0 0.00 Lock Operations
LCK.hcl.colls 0 0.00 Collisions
LCK.vcl.creat 2 0.00 Created locks
LCK.vcl.destroy 0 0.00 Destroyed locks
LCK.vcl.locks 27017 2.70 Lock Operations
LCK.vcl.colls 0 0.00 Collisions
LCK.stat.creat 2 0.00 Created locks
LCK.stat.destroy 0 0.00 Destroyed locks
LCK.stat.locks 3353 0.34 Lock Operations
LCK.stat.colls 0 0.00 Collisions
LCK.sessmem.creat 2 0.00 Created locks
LCK.sessmem.destroy 0 0.00 Destroyed locks
LCK.sessmem.locks 10434926 1043.81 Lock Operations
LCK.sessmem.colls 0 0.00 Collisions
LCK.wstat.creat 2 0.00 Created locks
LCK.wstat.destroy 0 0.00 Destroyed locks
LCK.wstat.locks 1062489 106.28 Lock Operations
LCK.wstat.colls 0 0.00 Collisions
LCK.herder.creat 2 0.00 Created locks
LCK.herder.destroy 0 0.00 Destroyed locks
LCK.herder.locks 28077 2.81 Lock Operations
LCK.herder.colls 0 0.00 Collisions
LCK.wq.creat 4 0.00 Created locks
LCK.wq.destroy 0 0.00 Destroyed locks
LCK.wq.locks 32085426 3209.51 Lock Operations
LCK.wq.colls 0 0.00 Collisions
LCK.objhdr.creat 6777786 677.98 Created locks
LCK.objhdr.destroy 38400 3.84 Destroyed locks
LCK.objhdr.locks 58265283 5828.28 Lock Operations
LCK.objhdr.colls 0 0.00 Collisions
LCK.exp.creat 2 0.00 Created locks
LCK.exp.destroy 0 0.00 Destroyed locks
LCK.exp.locks 7078276 708.04 Lock Operations
LCK.exp.colls 0 0.00 Collisions
LCK.lru.creat 4 0.00 Created locks
LCK.lru.destroy 0 0.00 Destroyed locks
LCK.lru.locks 6740732 674.28 Lock Operations
LCK.lru.colls 0 0.00 Collisions
LCK.cli.creat 2 0.00 Created locks
LCK.cli.destroy 0 0.00 Destroyed locks
LCK.cli.locks 112482 11.25 Lock Operations
LCK.cli.colls 0 0.00 Collisions
LCK.ban.creat 2 0.00 Created locks
LCK.ban.destroy 0 0.00 Destroyed locks
LCK.ban.locks 7078306 708.04 Lock Operations
LCK.ban.colls 0 0.00 Collisions
LCK.vbp.creat 2 0.00 Created locks
LCK.vbp.destroy 0 0.00 Destroyed locks
LCK.vbp.locks 0 0.00 Lock Operations
LCK.vbp.colls 0 0.00 Collisions
LCK.vbe.creat 2 0.00 Created locks
LCK.vbe.destroy 0 0.00 Destroyed locks
LCK.vbe.locks 7031988 703.41 Lock Operations
LCK.vbe.colls 0 0.00 Collisions
LCK.backend.creat 6 0.00 Created locks
LCK.backend.destroy 0 0.00 Destroyed locks
LCK.backend.locks 21975254 2198.18 Lock Operations
LCK.backend.colls 0 0.00 Collisions
SMF.s0.c_req 13531967 1353.60 Allocator requests
SMF.s0.c_fail 0 0.00 Allocator failures
SMF.s0.c_bytes 907854680064 90812711.82 Bytes allocated
SMF.s0.c_freed 795545841664 79578457.70 Bytes freed
SMF.s0.g_alloc 13484647 . Allocations outstanding
SMF.s0.g_bytes 112308838400 . Bytes outstanding
SMF.s0.g_space 510461419520 . Bytes available
SMF.s0.g_smf 14758490 . N struct smf
SMF.s0.g_smf_frag 1273251 . N small free smf
SMF.s0.g_smf_large 592 . N large free smf
SMA.Transient.c_req 57850 5.79 Allocator requests
SMA.Transient.c_fail 0 0.00 Allocator failures
SMA.Transient.c_bytes 3809803264 381094.65 Bytes allocated
SMA.Transient.c_freed 3809803264 381094.65 Bytes freed
SMA.Transient.g_alloc 0 . Allocations outstanding
SMA.Transient.g_bytes 0 . Bytes outstanding
SMA.Transient.g_space 0 . Bytes available
VBE.default(10.52.5.194,,8080).vcls 2 . VCL
references
VBE.default(10.52.5.194,,8080).happy 0 . Happy health
probes
VBE.thumbs(127.0.0.1,,8080).vcls 2 . VCL
references
VBE.thumbs(127.0.0.1,,8080).happy 0 . Happy health
probes
VBE.nm_elk(50.23.148.202,,80).vcls 2 . VCL
references
VBE.nm_elk(50.23.148.202,,80).happy 0 . Happy health
probes
}}}
Other strangeness; the directory that contains the actual varnish storage
bin behaves erratically, the file storage.bin is a sparse file of the size
requested on the startup (290gb) but sparse file listing shows that it
never climbs above the 205gb mark in terms of actual space used.
When varnish died on the 9th and auto restarted, this figure (actual size
used in sparse file) was registered as 153gb despite the cache being
completely empty.
client_req average hovers around 30 but peaks at 70, nothing at all huge
compared to what I've seen previously. The only thing that is out of the
ordinary I suppose is the size of the storage file (very long tail of
infrequently accessed objects on the backends with very high load on those
backend servers that we want to ease off on, hence the immense cache size)
The average io load is about 200kbytes reads per second and negligible
writes, load average rarely exceeds 3 (16 core Intel Xeon E5620 @ 2.40Ghz,
12gb memory.
That's all I can think of in terms of detail, I note the version of the
cache is up to date with upstream despite being a binary distributed from
the vendor, could it help to compile from source and see if the issue
persists?
--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1054#comment:5>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list