[Varnish] #38: Fault tree for performance diagnosis

Varnish varnish-bugs at projects.linpro.no
Sun Aug 20 10:13:03 CEST 2006


#38: Fault tree for performance diagnosis
----------------------+-----------------------------------------------------
 Reporter:  phk       |        Owner:  phk
     Type:  defect    |       Status:  new
 Priority:  normal    |    Milestone:     
Component:  varnishd  |      Version:     
 Severity:  normal    |   Resolution:     
 Keywords:            |  
----------------------+-----------------------------------------------------
Old description:

> This is a fault tree we can work through to eliminate theories:
>
> {{{
> Lower than expected traffic handling
>         Alteon not allocating traffic
>         -Packet loss
>         Packet delay
>         TCP/session setup failures or rejections
>         Bad Varnish responsetime
>
> Alteon not allocating traffic
>         Bad health-check response time
>         Health-check failues
>         TCP/session setup failures or rejections
>         TCP/session count confusion
>         Bad Varnish responsetime
>
> Bad health-check response time
>         -Packet loss
>         Packet delay
>         TCP/session setup failures or rejections
>         Bad Varnish responsetime
>
> Health-check failues
>         -Packet loss
>         Packet delay
>         TCP/session setup failures or rejections
>
> Packet loss
>         Alteon interface
>         GigE switch
>         bge1 interface
>         FreeBSD Network stack bugs
>         FreeBSD resource starvation
>
> Packet delay
>         Alteon bugs
>         Alteon interface
>         GigE switch
>         bge1 interface
>         FreeBSD Network stack bugs
>         FreeBSD rate limiting
>         FreeBSD resource starvation
>
> TCP/session setup failures or rejections
>         FreeBSD Network stack bugs
>         FreeBSD rate limiting
>         FreeBSD firewalling
>         FreeBSD routing
>         FreeBSD resource starvation
>         Varnish acceptor bugs
>         Varnish acceptor resource starvation
>
> Bad Varnish responsetime
>         varnish acceptor bugs
>         varnish response bugs
>         varnish lock contention
>         varnish resource starvation
>         thread library schedule bugs
>         FreeBSD rate limiting
>         FreeBSD resource starvation (sendfile ?)
>
> FreeBSD resource starvation
>         is sysctl kern.ipc.somaxconn: 128 enough ?
>
> Packet loss
>         An aggressive ping-test does not show any losses.
>         This is not conclusive, but at least we can defer further
>         Investigation until later.
>
> Packet delay
>         This one is mightlig suspect.
>
>         The 200msec delays we see against the Alteon is not only
>         present with the alteons health-check but also on a ping:
>         c21# ping -i .001 -c 1000 -q 10.0.2.1
>         round-trip min/avg/max/stddev = 0.171/0.354/1.038/0.109 ms
>         c21# ping -i .001 -c 1000 -q 10.0.0.2
>         round-trip min/avg/max/stddev = 0.193/18.655/220.481/46.626 ms
>         But the squids also see the 200msec delay in a ping test:
>         c1# ping -i .001 -c 1000 -q 10.0.0.2
>         round-trip min/avg/max/stddev = 0.219/22.747/222.899/50.803 ms
>         So this is not unique to us.
>

> }}}

New description:

 This is a fault tree we can work through to eliminate theories:

 {{{
 Lower than expected traffic handling
         Alteon not allocating traffic
         -Packet loss
         Packet delay
         TCP/session setup failures or rejections
         Bad Varnish responsetime

 Alteon not allocating traffic
         Bad health-check response time
         Health-check failues
         TCP/session setup failures or rejections
         TCP/session count confusion
         Bad Varnish responsetime

 Bad health-check response time
         -Packet loss
         Packet delay
         TCP/session setup failures or rejections
         Bad Varnish responsetime

 Health-check failues
         -Packet loss
         Packet delay
         TCP/session setup failures or rejections

 Packet loss
         Alteon interface
         GigE switch
         bge1 interface
         FreeBSD Network stack bugs
         FreeBSD resource starvation

         An aggressive ping-test does not show any losses.
         This is not conclusive, but at least we can defer further
         Investigation until later.


 Packet delay
         Alteon bugs
         Alteon interface
         GigE switch
         bge1 interface
         FreeBSD Network stack bugs
         FreeBSD rate limiting
         FreeBSD resource starvation

         This one is mightlig suspect.

         The 200msec delays we see against the Alteon is not only
         present with the alteons health-check but also on a ping:
         c21# ping -i .001 -c 1000 -q 10.0.2.1
         round-trip min/avg/max/stddev = 0.171/0.354/1.038/0.109 ms
         c21# ping -i .001 -c 1000 -q 10.0.0.2
         round-trip min/avg/max/stddev = 0.193/18.655/220.481/46.626 ms
         But the squids also see the 200msec delay in a ping test:
         c1# ping -i .001 -c 1000 -q 10.0.0.2
         round-trip min/avg/max/stddev = 0.219/22.747/222.899/50.803 ms
         So this is not unique to us.



 TCP/session setup failures or rejections
         FreeBSD Network stack bugs
         FreeBSD rate limiting
         FreeBSD firewalling
         FreeBSD routing
         FreeBSD resource starvation
         Varnish acceptor bugs
         Varnish acceptor resource starvation

 Bad Varnish responsetime
         varnish acceptor bugs
         varnish response bugs
         varnish lock contention
         varnish resource starvation
         thread library schedule bugs
         FreeBSD rate limiting
         FreeBSD resource starvation (sendfile ?)

 FreeBSD resource starvation
         is sysctl kern.ipc.somaxconn: 128 enough ?

 }}}

-- 
Ticket URL: <http://varnish.projects.linpro.no/ticket/38>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator


More information about the varnish-bugs mailing list