Backend Polling + connection timeouts = weird issue
Javier Frias
jfrias at gmail.com
Thu Apr 22 08:38:52 CEST 2010
So I'm having a weird sporadic issue, and I'm wondering if varnish can
mark a backend down on timed out connections that are not the backend
probe. I ask since I'm not getting failed backend messages in varnish
log, yet i'm getting burst of 503's.
My backend probe looks like this.
backend backend_XXX {
.host = "10.100.118.238";
.port = "80";
.probe = {
.url = "/robots.txt?__site=XXXXX";
.timeout = 5000ms;
.interval = 5s;
.window = 10;
.threshold = 4;
}
}
After one of these timed out connections...
201 ReqStart c 10.100.114.196 60540 1416455775
201 RxRequest c GET
201 RxURL c /XXXXsomeXXXXurl
201 RxProtocol c HTTP/1.0
201 RxHeader c User-Agent: Wget/1.10.2 (Red Hat modified)
201 RxHeader c Accept: */*
201 RxHeader c Host: xxxx.xxxxx.com
201 RxHeader c Connection: Keep-Alive
201 VCL_call c recv
201 VCL_return c lookup
201 VCL_call c hash
201 VCL_return c hash
201 VCL_call c miss
201 VCL_return c fetch
----> 201 FetchError c no backend connection <---------
201 VCL_call c error
201 VCL_return c deliver
201 Length c 839
201 VCL_call c deliver
201 VCL_return c deliver
201 TxProtocol c HTTP/1.1
201 TxStatus c 503
201 TxResponse c Service Unavailable
201 TxHeader c Server: Varnish
201 TxHeader c Retry-After: 0
201 TxHeader c Content-Length: 839
201 TxHeader c Date: Thu, 22 Apr 2010 02:03:46 GMT
201 TxHeader c X-Varnish: 1416455775
201 TxHeader c Age: 0
201 TxHeader c Via: 1.1 varnish
201 TxHeader c Connection: close
201 ReqEnd c 1416455775 1271901825.787379980
1271901826.187721968 0.000025034 0.400313139 0.000028849
201 SessionClose c error
201 StatSess c 10.100.114.196 60540 0 1 1 0 0 0 195 839
a typical timed out connection, which i'm verifying in my backend as
having taken close to 80 seconds looks like this..
825637170 WorkThread - 924 1271901801.804220915 0.000025988
1.712481976 0.000046015
64 SessionOpen c 10.100.114.196 60537 0.0.0.0:80
64 ReqStart c 10.100.114.196 60537 1416454378
64 RxRequest c GET
64 RxURL c /XXXsomeXXXsimilarXXXurl
64 RxProtocol c HTTP/1.0
64 RxHeader c User-Agent: Wget/1.10.2 (Red Hat modified)
64 RxHeader c Accept: */*
64 RxHeader c Host: sec.floridatoday.com
64 RxHeader c Connection: Keep-Alive
64 VCL_call c recv
64 VCL_return c lookup
64 VCL_call c hash
64 VCL_return c hash
64 VCL_call c miss
64 VCL_return c fetch
211 BackendOpen b backend_XXX 10.100.118.149 12967 10.100.118.238 80
64 Backend c 211 backend_XXX backend_XXX
211 TxRequest b GET
211 TxURL b /XXXsomeXXXsimilarXXXurl
211 TxProtocol b HTTP/1.1
211 TxHeader b User-Agent: Wget/1.10.2 (Red Hat modified)
211 TxHeader b Accept: */*
211 TxHeader b Host: xxxx.xxxxxx.com
211 TxHeader b X-Forwarded-For: 10.100.114.196
211 TxHeader b X-Varnish: 1416454378
117 Debug c "herding"
202 Debug c "herding"
.... never comes back...
( again, going back to the backend, shows this request as having taken
80 seconds, so i'm assuming varnish gave up after X seconds)
Has anyone seen something like this? It does recover... but it seems
like a time out connection is causing the whole backend to be marked
as failed.. which sucks big time for me since my backends are behind a
load balancer.
Should I try connection restarts? Would they even help in this case?
-Javier
PS: Unrelated, but i tried registering on trac, so submit my
varnishncsa extended logging patch, and it seems the registration is
not working on varnish-cache.org
More information about the varnish-dev
mailing list