[Varnish] #162: Varnish trunk dies with assert error in SES_Delete()

Varnish varnish-bugs at projects.linpro.no
Tue Oct 9 11:17:13 CEST 2007


#162: Varnish trunk dies with assert error in SES_Delete()
---------------------------------------------------------+------------------
 Reporter:  anders                                       |        Owner:  phk  
     Type:  defect                                       |       Status:  new  
 Priority:  high                                         |    Milestone:       
Component:  varnishd                                     |      Version:  trunk
 Severity:  normal                                       |   Resolution:       
 Keywords:  varnishd core dump SES_Delete cache_session  |  
---------------------------------------------------------+------------------
Comment (by phk):

 = A sort of status report =

 As is appearant from the above, this bug is quite elusive, and I can't
 really claim to be any closer to its resolution now, than I have been
 earlier.

 The exact triggering condition in SES_Delete() is patently impossible from
 a reading of the source code, and the sanity of the sp->elements
 considered, this is not a random pointer tango.

 About the only explanation that makes sense is that a session gets started
 twice from the acceptor or possibly sent there twice.

 Other less plausible explanations are locking bugs in the pipe(2) code,
 hardware cache coherency bugs or memory barrier deficiencies.

 Based on rather light evidence so far, it seems that using the
 poll_acceptor either eliminated or at least reduces the frequency
 drastically, pointing somewhat in the direction of kqueue_acceptor.

 Despite the poll_acceptor soaking up considerably more CPU time, I think
 we should continue to run with it for some days, to see if it totally
 eliminates this problem.

 If after some days, we find that to be the case, the next step is probably
 to update the machine from 6.2-R to RELENG_6 head, try the kqueue acceptor
 and see if the trouble still exists.

 If it does, a further update to either FreeBSD-current or RELENG_7 might
 be a good idea.

 In the meantime I will try to see if I can spot anything in
 kqueue_acceptor that doesn't work as expected, and to see if I can
 reproduce the problem in my lab.  I may also add some flags and asserts to
 try to catch sessions which get started more than once.

 Poul-Henning

-- 
Ticket URL: <http://varnish.projects.linpro.no/ticket/162#comment:14>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator


More information about the varnish-bugs mailing list