[Varnish] #951: varnish stalls connections on high traffic to non-cacheable urls
Varnish
varnish-bugs at varnish-cache.org
Fri Jul 1 16:32:29 CEST 2011
#951: varnish stalls connections on high traffic to non-cacheable urls
---------------------------------+------------------------------------------
Reporter: tttt | Type: defect
Status: new | Priority: normal
Milestone: Varnish 2.1 release | Component: varnishd
Version: 2.1.5 | Severity: major
Keywords: |
---------------------------------+------------------------------------------
scenario:
- overall request rate to varnish close to 1000req/s or more
- tens of thousands active clients browsing user sites
- most requested UNCACHEABLE url's get stuck - varnish never(that is a few
minutes at least) return response - webservers DO NOT get hit after its
stuck for that url
- several requests per second to these urls - high probability they get
hit almost simultaneusly by several users - i suspect some form of race
condition /deadlock
- slow responding web servers - for every request they may take up to
several seconds to process
- hard to catch the moment it gets stuck as log volume is too high to log
everything
- n_sess grows with every stuck request, eating system ram
- OS: centos 5.6 x64, varnish-2.1.5-1 rpms
- request path: haproxy->varnish->haproxy->apache->php
- no swapping in normal operation, entire cache in ram
cmdline:
{{{
/usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -T localhost:6082 -f
/etc/varnish/default.vcl -u varnish -g varnish -S /etc/varnish/secret -w
1300,4000,60 -p thread_pool_add_delay 3 -s malloc,8G -p session_max 400000
-p cli_timeout 20 -p listen_depth 2048 -a
192.168.1.202:6080,127.0.0.1:6080
}}}
example log AFTER its stuck:
{{{
88148 SessionOpen c 192.168.1.217 40177 192.168.1.202:6080
88148 ReqStart c 192.168.1.217 40177 741240382
88148 RxRequest c GET
88148 RxURL c /
88148 RxProtocol c HTTP/1.1
88148 RxHeader c Host: xxxxxxxx.xxx.xx
88148 RxHeader c Accept: application/vnd.wap.xhtml+xml,
application/xhtml+xml, text/html, application/vnd.wap.wmlc,
image/vnd.wap.wbmp, image/png, image/jpeg, image/gif, image/bmp,
text/vnd.wap.wml, text/vnd.wap.wmlscript, application/vnd.oma.dd+xml,
text/vnd.sun.j2me.app
88148 RxHeader c Accept-Language: vi
88148 RxHeader c Accept-Charset:
utf-8;q=1.0,utf-16;q=1.0,iso-8859-1;q=0.6,*;q=0.1
88148 RxHeader c x-wap-profile:
"http://wap.samsungmobile.com/uaprof/GT-C3510.xml"
88148 RxHeader c User-Agent: SAMSUNG-GT-C3510/1.0 NetFront/3.5
Profile/MIDP-2.0 Configuration/CLDC-1.1
88148 RxHeader c Accept-Encoding: deflate, gzip, x-gzip, identity,
*;q=0
88148 RxHeader c X-Forwarded-For: yyy.yyy.yyy.yyy
88148 RxHeader c Connection: close
88148 VCL_call c recv
88148 VCL_return c lookup
88148 VCL_call c hash
88148 VCL_return c hash
}}}
thats it, no more log entries for 88148 in near future at least
- question: is there a way to check the state of the stuck threads?
--
Ticket URL: <http://www.varnish-cache.org/trac/ticket/951>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list