Varnish child crashing on assert() starting with r4631

Justin Pasher justinp at newmediagateway.com
Fri Mar 19 23:16:25 CET 2010


Hello,

I recently upgrade from trunk r4602 to trunk r4632 and now I'm starting 
to see a lot of panics in the syslog for varnish. I looked through the 
changelog and it appears in r4631, an assert() was added to 
bin/varnishd/cache_hash.c. Now the code seems to be hitting that assert 
and failing, thus crashing the child process. Varnish seems to keep 
running without problems, although the cache gets completely cleared out 
according to varnishstats. I'm not seeing a definitive pattern in when 
the assertion fails, but it's roughly every five minutes. The overall 
traffic to the site is not huge (probably only about 3000 page requests 
during that time right before it crashes).

Some excerpts from syslog are below:

Mar 19 16:51:19 wc2 tvan[13699]: Child (11787) died signal=6
Mar 19 16:51:19 wc2 tvan[13699]: Child (11787) Panic message: Assert 
error in HSH_DeleteObjHead(), cache_hash.c line 157:#012  
Condition(VTAILQ_EMPTY(&oh->waitinglist)) not true.#012thread = 
(hcb_cleaner)#012ident = 
Linux,2.6.22.10-NMG,i686,-sfile,-hcritbit,epoll#012Backtrace:#012  
0x806e382: pan_ic+d2#012  0x80663b7: HSH_DeleteObjHead+97#012  
0x807c907: hcb_cleaner+c7#012  0xb7ef34c0: _end+afe36c94#012  
0xb7e336de: _end+afd76eb2#012
Mar 19 16:51:19 wc2 tvan[13699]: Child cleanup complete
Mar 19 16:51:19 wc2 tvan[13699]: child (12099) Started
Mar 19 16:51:19 wc2 tvan[13699]: Child (12099) said Closed fds: 4 5 6 9 
10 12 13
Mar 19 16:51:19 wc2 tvan[13699]: Child (12099) said Child starts
Mar 19 16:51:19 wc2 tvan[13699]: Child (12099) said managed to mmap 
1073741824 bytes of 1073741824
Mar 19 16:57:30 wc2 tvan[13699]: Child (12099) died signal=6
Mar 19 16:57:30 wc2 tvan[13699]: Child (12099) Panic message: Assert 
error in HSH_DeleteObjHead(), cache_hash.c line 157:#012  
Condition(VTAILQ_EMPTY(&oh->waitinglist)) not true.#012thread = 
(hcb_cleaner)#012ident = 
Linux,2.6.22.10-NMG,i686,-sfile,-hcritbit,epoll#012Backtrace:#012  
0x806e382: pan_ic+d2#012  0x80663b7: HSH_DeleteObjHead+97#012  
0x807c907: hcb_cleaner+c7#012  0xb7ef34c0: _end+afe36c94#012  
0xb7e336de: _end+afd76eb2#012
Mar 19 16:57:30 wc2 tvan[13699]: Child cleanup complete
Mar 19 16:57:30 wc2 tvan[13699]: child (12372) Started
Mar 19 16:57:30 wc2 tvan[13699]: Child (12372) said Closed fds: 4 5 6 9 
10 12 13
Mar 19 16:57:30 wc2 tvan[13699]: Child (12372) said Child starts
Mar 19 16:57:30 wc2 tvan[13699]: Child (12372) said managed to mmap 
1073741824 bytes of 1073741824
Mar 19 17:03:43 wc2 tvan[13699]: Child (12372) died signal=6
Mar 19 17:03:43 wc2 tvan[13699]: Child (12372) Panic message: Assert 
error in HSH_DeleteObjHead(), cache_hash.c line 157:#012  
Condition(VTAILQ_EMPTY(&oh->waitinglist)) not true.#012thread = 
(hcb_cleaner)#012ident = 
Linux,2.6.22.10-NMG,i686,-sfile,-hcritbit,epoll#012Backtrace:#012  
0x806e382: pan_ic+d2#012  0x80663b7: HSH_DeleteObjHead+97#012  
0x807c907: hcb_cleaner+c7#012  0xb7ef34c0: _end+afe36c94#012  
0xb7e336de: _end+afd76eb2#012
Mar 19 17:03:43 wc2 tvan[13699]: Child cleanup complete
Mar 19 17:03:43 wc2 tvan[13699]: child (12750) Started
Mar 19 17:03:43 wc2 tvan[13699]: Child (12750) said Closed fds: 4 5 6 9 
10 12 13
Mar 19 17:03:43 wc2 tvan[13699]: Child (12750) said Child starts
Mar 19 17:03:43 wc2 tvan[13699]: Child (12750) said managed to mmap 
1073741824 bytes of 1073741824

I actually have two separate varnish instances running off of the same 
binaries so that I can easily control them independently (and use 
different IP addresses). Both of them are acting the same.

Any ideas on what I can do to troubleshoot this further?

Justin Pasher




More information about the varnish-misc mailing list