[Varnish] #38: Fault tree for performance diagnosis
Varnish
varnish-bugs at projects.linpro.no
Sun Aug 20 10:16:43 CEST 2006
#38: Fault tree for performance diagnosis
----------------------+-----------------------------------------------------
Reporter: phk | Owner: phk
Type: defect | Status: new
Priority: normal | Milestone:
Component: varnishd | Version:
Severity: normal | Resolution:
Keywords: |
----------------------+-----------------------------------------------------
Old description:
> This is a fault tree we can work through to eliminate theories:
>
> {{{
> Lower than expected traffic handling
> Alteon not allocating traffic
> -Packet loss
> Packet delay
> TCP/session setup failures or rejections
> Bad Varnish responsetime
>
> Alteon not allocating traffic
> Bad health-check response time
> Health-check failues
> TCP/session setup failures or rejections
> TCP/session count confusion
> Bad Varnish responsetime
>
> Bad health-check response time
> -Packet loss
> Packet delay
> TCP/session setup failures or rejections
> Bad Varnish responsetime
>
> Health-check failues
> -Packet loss
> Packet delay
> TCP/session setup failures or rejections
>
> Packet loss
> Alteon interface
> GigE switch
> bge1 interface
> FreeBSD Network stack bugs
> FreeBSD resource starvation
>
> An aggressive ping-test does not show any losses.
> This is not conclusive, but at least we can defer further
> Investigation until later.
>
> Packet delay
> Alteon bugs
> Alteon interface
> GigE switch
> bge1 interface
> FreeBSD Network stack bugs
> FreeBSD rate limiting
> FreeBSD resource starvation
>
> This one is mightlig suspect.
>
> The 200msec delays we see against the Alteon is not only
> present with the alteons health-check but also on a ping:
> c21# ping -i .001 -c 1000 -q 10.0.2.1
> round-trip min/avg/max/stddev = 0.171/0.354/1.038/0.109 ms
> c21# ping -i .001 -c 1000 -q 10.0.0.2
> round-trip min/avg/max/stddev = 0.193/18.655/220.481/46.626 ms
> But the squids also see the 200msec delay in a ping test:
> c1# ping -i .001 -c 1000 -q 10.0.0.2
> round-trip min/avg/max/stddev = 0.219/22.747/222.899/50.803 ms
> So this is not unique to us.
>
>
> TCP/session setup failures or rejections
> FreeBSD Network stack bugs
> FreeBSD rate limiting
> FreeBSD firewalling
> FreeBSD routing
> FreeBSD resource starvation
> Varnish acceptor bugs
> Varnish acceptor resource starvation
>
> Bad Varnish responsetime
> varnish acceptor bugs
> varnish response bugs
> varnish lock contention
> varnish resource starvation
> thread library schedule bugs
> FreeBSD rate limiting
> FreeBSD resource starvation (sendfile ?)
>
> FreeBSD resource starvation
> is sysctl kern.ipc.somaxconn: 128 enough ?
>
> }}}
New description:
This is a fault tree we can work through to eliminate theories:
{{{
Lower than expected traffic handling
Alteon not allocating traffic
-Packet loss
Packet delay
TCP/session setup failures or rejections
Bad Varnish responsetime
Alteon not allocating traffic
Bad health-check response time
Health-check failues
TCP/session setup failures or rejections
TCP/session count confusion
Bad Varnish responsetime
When using the "leastconn" metric, the trafic allocation
is very sensitive to the varnish sess_timeout setting, the
lower I set it, the more traffic I get.
From a small sample of data, it looks like there is little
benefit from setting it above five seconds anyway.
Bad health-check response time
-Packet loss
Packet delay
TCP/session setup failures or rejections
Bad Varnish responsetime
Health-check failues
-Packet loss
Packet delay
TCP/session setup failures or rejections
Packet loss
Alteon interface
GigE switch
bge1 interface
FreeBSD Network stack bugs
FreeBSD resource starvation
An aggressive ping-test does not show any losses.
This is not conclusive, but at least we can defer further
Investigation until later.
Packet delay
Alteon bugs
Alteon interface
GigE switch
bge1 interface
FreeBSD Network stack bugs
FreeBSD rate limiting
FreeBSD resource starvation
This one is mightlig suspect.
The 200msec delays we see against the Alteon is not only
present with the alteons health-check but also on a ping:
c21# ping -i .001 -c 1000 -q 10.0.2.1
round-trip min/avg/max/stddev = 0.171/0.354/1.038/0.109 ms
c21# ping -i .001 -c 1000 -q 10.0.0.2
round-trip min/avg/max/stddev = 0.193/18.655/220.481/46.626 ms
But the squids also see the 200msec delay in a ping test:
c1# ping -i .001 -c 1000 -q 10.0.0.2
round-trip min/avg/max/stddev = 0.219/22.747/222.899/50.803 ms
So this is not unique to us.
TCP/session setup failures or rejections
FreeBSD Network stack bugs
FreeBSD rate limiting
FreeBSD firewalling
FreeBSD routing
FreeBSD resource starvation
Varnish acceptor bugs
Varnish acceptor resource starvation
Bad Varnish responsetime
varnish acceptor bugs
varnish response bugs
varnish lock contention
varnish resource starvation
thread library schedule bugs
FreeBSD rate limiting
FreeBSD resource starvation (sendfile ?)
FreeBSD resource starvation
is sysctl kern.ipc.somaxconn: 128 enough ?
}}}
--
Ticket URL: <http://varnish.projects.linpro.no/ticket/38>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list