Varnish child crashing on assert() starting with r4631
Justin Pasher
justinp at newmediagateway.com
Fri Mar 19 23:16:25 CET 2010
Hello,
I recently upgrade from trunk r4602 to trunk r4632 and now I'm starting
to see a lot of panics in the syslog for varnish. I looked through the
changelog and it appears in r4631, an assert() was added to
bin/varnishd/cache_hash.c. Now the code seems to be hitting that assert
and failing, thus crashing the child process. Varnish seems to keep
running without problems, although the cache gets completely cleared out
according to varnishstats. I'm not seeing a definitive pattern in when
the assertion fails, but it's roughly every five minutes. The overall
traffic to the site is not huge (probably only about 3000 page requests
during that time right before it crashes).
Some excerpts from syslog are below:
Mar 19 16:51:19 wc2 tvan[13699]: Child (11787) died signal=6
Mar 19 16:51:19 wc2 tvan[13699]: Child (11787) Panic message: Assert
error in HSH_DeleteObjHead(), cache_hash.c line 157:#012
Condition(VTAILQ_EMPTY(&oh->waitinglist)) not true.#012thread =
(hcb_cleaner)#012ident =
Linux,2.6.22.10-NMG,i686,-sfile,-hcritbit,epoll#012Backtrace:#012
0x806e382: pan_ic+d2#012 0x80663b7: HSH_DeleteObjHead+97#012
0x807c907: hcb_cleaner+c7#012 0xb7ef34c0: _end+afe36c94#012
0xb7e336de: _end+afd76eb2#012
Mar 19 16:51:19 wc2 tvan[13699]: Child cleanup complete
Mar 19 16:51:19 wc2 tvan[13699]: child (12099) Started
Mar 19 16:51:19 wc2 tvan[13699]: Child (12099) said Closed fds: 4 5 6 9
10 12 13
Mar 19 16:51:19 wc2 tvan[13699]: Child (12099) said Child starts
Mar 19 16:51:19 wc2 tvan[13699]: Child (12099) said managed to mmap
1073741824 bytes of 1073741824
Mar 19 16:57:30 wc2 tvan[13699]: Child (12099) died signal=6
Mar 19 16:57:30 wc2 tvan[13699]: Child (12099) Panic message: Assert
error in HSH_DeleteObjHead(), cache_hash.c line 157:#012
Condition(VTAILQ_EMPTY(&oh->waitinglist)) not true.#012thread =
(hcb_cleaner)#012ident =
Linux,2.6.22.10-NMG,i686,-sfile,-hcritbit,epoll#012Backtrace:#012
0x806e382: pan_ic+d2#012 0x80663b7: HSH_DeleteObjHead+97#012
0x807c907: hcb_cleaner+c7#012 0xb7ef34c0: _end+afe36c94#012
0xb7e336de: _end+afd76eb2#012
Mar 19 16:57:30 wc2 tvan[13699]: Child cleanup complete
Mar 19 16:57:30 wc2 tvan[13699]: child (12372) Started
Mar 19 16:57:30 wc2 tvan[13699]: Child (12372) said Closed fds: 4 5 6 9
10 12 13
Mar 19 16:57:30 wc2 tvan[13699]: Child (12372) said Child starts
Mar 19 16:57:30 wc2 tvan[13699]: Child (12372) said managed to mmap
1073741824 bytes of 1073741824
Mar 19 17:03:43 wc2 tvan[13699]: Child (12372) died signal=6
Mar 19 17:03:43 wc2 tvan[13699]: Child (12372) Panic message: Assert
error in HSH_DeleteObjHead(), cache_hash.c line 157:#012
Condition(VTAILQ_EMPTY(&oh->waitinglist)) not true.#012thread =
(hcb_cleaner)#012ident =
Linux,2.6.22.10-NMG,i686,-sfile,-hcritbit,epoll#012Backtrace:#012
0x806e382: pan_ic+d2#012 0x80663b7: HSH_DeleteObjHead+97#012
0x807c907: hcb_cleaner+c7#012 0xb7ef34c0: _end+afe36c94#012
0xb7e336de: _end+afd76eb2#012
Mar 19 17:03:43 wc2 tvan[13699]: Child cleanup complete
Mar 19 17:03:43 wc2 tvan[13699]: child (12750) Started
Mar 19 17:03:43 wc2 tvan[13699]: Child (12750) said Closed fds: 4 5 6 9
10 12 13
Mar 19 17:03:43 wc2 tvan[13699]: Child (12750) said Child starts
Mar 19 17:03:43 wc2 tvan[13699]: Child (12750) said managed to mmap
1073741824 bytes of 1073741824
I actually have two separate varnish instances running off of the same
binaries so that I can easily control them independently (and use
different IP addresses). Both of them are acting the same.
Any ideas on what I can do to troubleshoot this further?
Justin Pasher
More information about the varnish-misc
mailing list