[Varnish] #1331: Varnish coredump every day

Varnish varnish-bugs at varnish-cache.org
Mon Aug 5 12:02:31 CEST 2013


#1331: Varnish coredump every day
-------------------------+--------------------
 Reporter:  jinjian.1@…  |       Owner:
     Type:  defect       |      Status:  new
 Priority:  high         |   Milestone:
Component:  varnishd     |     Version:  3.0.3
 Severity:  critical     |  Resolution:
 Keywords:  coredump     |
-------------------------+--------------------
Description changed by tfheen:

Old description:

> we encountered varnish coredump issue everyday in this week. My version
> is 3.0.3
>
> From var/log/messages:
>
> Aug  2 07:50:26 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug  2 07:50:36 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug  2 07:50:47 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug  2 07:50:53 ip-10-36-1-238 stud[10104]: {client} Connection closed
> (in data)
> Aug  2 07:50:53 ip-10-36-1-238 stud[10104]: ipaddress :10.36.1.238
> accept!
> Aug  2 07:50:57 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug  2 07:51:02 ip-10-36-1-238 stud[10104]: {backend} Connection reset by
> peer
> Aug  2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug  2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) not
> responding to CLI, killing it.
> Aug  2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) died
> signal=3 (core dumped)
> Aug  2 07:51:02 ip-10-36-1-238 varnishd[28776]: child (20041) Started
> Aug  2 07:51:04 ip-10-36-1-238 varnishd[28776]: Child (20041) said Child
> starts
>

> from coredump:
>
> (gdb) bt
> #0  0x00007fdce4b41054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007fdce4b3c388 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x00007fdce4b3c257 in pthread_mutex_lock () from
> /lib64/libpthread.so.0
> #3  0x0000000000434350 in vsl_get ()
> #4  0x0000000000434508 in VSLR ()
> #5  0x00000000004346d2 in VSL ()
> #6  0x00007fdce66d2d95 in cls_vlu2 (priv=0x7fdce3d42780,
> av=0x7fd96e85b500) at cli_serve.c:292
> #7  0x00007fdce66d347b in cls_vlu (priv=0x7fdce3d42780, p=0x2 <Address
> 0x2 out of bounds>) at cli_serve.c:339
> #8  0x00007fdce66d6e09 in LineUpProcess (l=0x7fdce3d1d730) at vlu.c:154
> #9  0x00007fdce66d3e7d in VCLS_Poll (cs=0x7fdce3d03290, timeout=<value
> optimized out>) at cli_serve.c:528
> #10 0x000000000041aa41 in CLI_Run ()
> #11 0x000000000042ea01 in child_main ()
> #12 0x000000000044155c in start_child ()
> #13 0x0000000000441ee8 in MGT_Run ()
> #14 0x000000000045037f in main ()
>
> Our system is down for almost 1 minute during the recover process.
>
> The issue is very similar with https://www.varnish-
> cache.org/trac/ticket/516  and  https://www.varnish-
> cache.org/trac/ticket/1054. But i could not find any solution there. Do
> anybody could put some lights on it?

New description:

 we encountered varnish coredump issue everyday in this week. My version is
 3.0.3

 From var/log/messages:

 {{{
 Aug  2 07:50:26 ip-10-36-1-238 varnishd[28776]: Child (28777) not
 responding to CLI, killing it.
 Aug  2 07:50:36 ip-10-36-1-238 varnishd[28776]: Child (28777) not
 responding to CLI, killing it.
 Aug  2 07:50:47 ip-10-36-1-238 varnishd[28776]: Child (28777) not
 responding to CLI, killing it.
 Aug  2 07:50:53 ip-10-36-1-238 stud[10104]: {client} Connection closed (in
 data)
 Aug  2 07:50:53 ip-10-36-1-238 stud[10104]: ipaddress :10.36.1.238 accept!
 Aug  2 07:50:57 ip-10-36-1-238 varnishd[28776]: Child (28777) not
 responding to CLI, killing it.
 Aug  2 07:51:02 ip-10-36-1-238 stud[10104]: {backend} Connection reset by
 peer
 Aug  2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) not
 responding to CLI, killing it.
 Aug  2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) not
 responding to CLI, killing it.
 Aug  2 07:51:02 ip-10-36-1-238 varnishd[28776]: Child (28777) died
 signal=3 (core dumped)
 Aug  2 07:51:02 ip-10-36-1-238 varnishd[28776]: child (20041) Started
 Aug  2 07:51:04 ip-10-36-1-238 varnishd[28776]: Child (20041) said Child
 starts
 }}}

 from coredump:

 {{{
 (gdb) bt
 #0  0x00007fdce4b41054 in __lll_lock_wait () from /lib64/libpthread.so.0
 #1  0x00007fdce4b3c388 in _L_lock_854 () from /lib64/libpthread.so.0
 #2  0x00007fdce4b3c257 in pthread_mutex_lock () from
 /lib64/libpthread.so.0
 #3  0x0000000000434350 in vsl_get ()
 #4  0x0000000000434508 in VSLR ()
 #5  0x00000000004346d2 in VSL ()
 #6  0x00007fdce66d2d95 in cls_vlu2 (priv=0x7fdce3d42780,
 av=0x7fd96e85b500) at cli_serve.c:292
 #7  0x00007fdce66d347b in cls_vlu (priv=0x7fdce3d42780, p=0x2 <Address 0x2
 out of bounds>) at cli_serve.c:339
 #8  0x00007fdce66d6e09 in LineUpProcess (l=0x7fdce3d1d730) at vlu.c:154
 #9  0x00007fdce66d3e7d in VCLS_Poll (cs=0x7fdce3d03290, timeout=<value
 optimized out>) at cli_serve.c:528
 #10 0x000000000041aa41 in CLI_Run ()
 #11 0x000000000042ea01 in child_main ()
 #12 0x000000000044155c in start_child ()
 #13 0x0000000000441ee8 in MGT_Run ()
 #14 0x000000000045037f in main ()
 }}}

 Our system is down for almost 1 minute during the recover process.

 The issue is very similar with https://www.varnish-
 cache.org/trac/ticket/516  and  https://www.varnish-
 cache.org/trac/ticket/1054. But i could not find any solution there. Do
 anybody could put some lights on it?

--

-- 
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1331#comment:1>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator




More information about the varnish-bugs mailing list