[Varnish] #649: Varnish LINGER crash on Solaris

Varnish varnish-bugs at varnish-cache.org
Tue May 18 00:57:21 CEST 2010


#649: Varnish LINGER crash on Solaris
---------------------+------------------------------------------------------
 Reporter:  victori  |        Type:  defect
   Status:  new      |    Priority:  normal
Milestone:           |   Component:  build 
  Version:  trunk    |    Severity:  normal
 Keywords:           |  
---------------------+------------------------------------------------------

Comment(by jdzst):

 Hello,

 I am testing Varnish (r4576) in Solaris 10 5.10 Generic_120011-14 sun4v
 sparc SUNW,Sun-Fire-T2000. [[BR]]
 We are planning to use a cache like Varnish or Squid and I have followed
 the instructions in http://letsgetdugg.com/2009/12/04/varnish-on-solaris/

 I have the same LINGER crash like in #660 that has the same root cause in
 #649 :

 {{{
 child (4033) Started
 Child (4033) said Closed fds: 3 5 6 7 13 14 16 17
 Child (4033) said Child starts
 Child (4033) said managed to mmap 4583923712 bytes of 4583923712
 Child (4033) died signal=6
 Child (4033) Panic message: Assert error in TCP_linger(), tcp.c line 271:
   Condition(TCP_Check(i)) not true.
 errno = 22 (Invalid argument)
 ident = -sfile,-hcritbit,ports


 Child cleanup complete
 child (12179) Started
 Child (12179) said Closed fds: 3 5 6 7 13 14 16 17
 Child (12179) said Child starts
 Child (12179) said managed to mmap 4583923712 bytes of 4583923712
 Child (12179) died signal=6
 Child (12179) Panic message: Assert error in TCP_linger(), tcp.c line 271:
   Condition(TCP_Check(i)) not true.
 errno = 22 (Invalid argument)
 ident = -sfile,-hcritbit,ports


 Child cleanup complete
 }}}


 I have trying to fix the bug and I have found '''the problem is that
 solaris setsockopt returns sometimes EINVAL''' when it is no invalid
 parameters, problem found in Java JVM in Solaris:[[BR]]
 * http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6378870  [[BR]]
 *
 http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=7141b1811572e415779f4a711a96?bug_id=6850464
 [[BR]]

 {{{
 2. The Sockets API in Java is not truly portable because it still closely
 mirro
 rs the behavior of the OS's internal socket implementation. The root of
 the prob
 lem is that Solaris is unique in that calls to setsockopt can result in an
 EINVA
 L if the underlying connection has closed. This behavior was actually not
 docume
 nted on Solaris 8, they did finally document it in Solaris 9.

 [...]

 1. Most platforms do not return an error on calls to setsockopt
 2. Solaris does do this, but it was not documented at the time the JVM and
 tomca
 t were developed.
 3. The tomcat error was difficult to reproduce, because it only occurs
 when a cl
 ient quickly closes its connection between the initial call to accept()
 and the
 first call to setsockopt(). (This information was of course not known when
 the p
 roblem was reported in the past, because no one has been able to gather
 the data
  that shows how it occurs until now)
 4. EINVAL is usually used to indicate a bad argument was passed to the
 call (in
 fact this is what the Solaris 8 documentation says). This gives one the
 impressi
 on of something wrong in the JVM, because it is the JVM's responsibility
 to pass
  correct data structures to OS system calls.
 }}}

 After reading all this information, I changed the definition of
 "TCP_Check" in '''libvarnish.h'''
 {{{
 #define TCP_Check(a) ((a) == 0 || errno == ECONNRESET || errno == ENOTCONN
 || errno == EINVAL)
 //OLD: #define TCP_Check(a) ((a) == 0 || errno == ECONNRESET || errno ==
 ENOTCONN)
 }}}

 I have tested the change (in a test enviroment, not production), and it
 seems works right.

 Some possibility is to change the definition only for Solaris with some
 #ifdef, I am new in Varnish, ¿what is de better solution to make the
 modification in trunk code?

 Thank you

-- 
Ticket URL: <http://varnish-cache.org/ticket/649#comment:3>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator




More information about the varnish-bugs mailing list