[Varnish] #951: varnish stalls connections on high traffic to non-cacheable urls

Varnish varnish-bugs at varnish-cache.org
Fri Jul 1 16:32:29 CEST 2011


#951: varnish stalls connections on high traffic to non-cacheable urls
---------------------------------+------------------------------------------
 Reporter:  tttt                 |        Type:  defect  
   Status:  new                  |    Priority:  normal  
Milestone:  Varnish 2.1 release  |   Component:  varnishd
  Version:  2.1.5                |    Severity:  major   
 Keywords:                       |  
---------------------------------+------------------------------------------
 scenario:

 - overall request rate to varnish close to 1000req/s or more

 - tens of thousands active clients browsing user sites

 - most requested UNCACHEABLE url's get stuck - varnish never(that is a few
 minutes at least) return response - webservers DO NOT get hit after its
 stuck for that url

 - several requests per second to these urls - high probability they get
 hit almost simultaneusly by several users - i suspect some form of race
 condition /deadlock

 - slow responding web servers - for every request they may take up to
 several seconds to process

 - hard to catch the moment it gets stuck as log volume is too high to log
 everything

 - n_sess grows with every stuck request, eating system ram

 - OS: centos 5.6 x64, varnish-2.1.5-1 rpms

 - request path: haproxy->varnish->haproxy->apache->php

 - no swapping in normal operation, entire cache in ram

 cmdline:
 {{{
 /usr/sbin/varnishd -P /var/run/varnish.pid -a :80 -T localhost:6082 -f
 /etc/varnish/default.vcl -u varnish -g varnish -S /etc/varnish/secret -w
 1300,4000,60 -p thread_pool_add_delay 3 -s malloc,8G -p session_max 400000
 -p cli_timeout 20 -p listen_depth 2048 -a
 192.168.1.202:6080,127.0.0.1:6080
 }}}

 example log AFTER its stuck:


 {{{
 88148 SessionOpen  c 192.168.1.217 40177 192.168.1.202:6080
 88148 ReqStart     c 192.168.1.217 40177 741240382
 88148 RxRequest    c GET
 88148 RxURL        c /
 88148 RxProtocol   c HTTP/1.1
 88148 RxHeader     c Host: xxxxxxxx.xxx.xx
 88148 RxHeader     c Accept: application/vnd.wap.xhtml+xml,
 application/xhtml+xml, text/html, application/vnd.wap.wmlc,
 image/vnd.wap.wbmp, image/png, image/jpeg, image/gif, image/bmp,
 text/vnd.wap.wml, text/vnd.wap.wmlscript, application/vnd.oma.dd+xml,
 text/vnd.sun.j2me.app
 88148 RxHeader     c Accept-Language: vi
 88148 RxHeader     c Accept-Charset:
 utf-8;q=1.0,utf-16;q=1.0,iso-8859-1;q=0.6,*;q=0.1
 88148 RxHeader     c x-wap-profile:
 "http://wap.samsungmobile.com/uaprof/GT-C3510.xml"
 88148 RxHeader     c User-Agent: SAMSUNG-GT-C3510/1.0 NetFront/3.5
 Profile/MIDP-2.0 Configuration/CLDC-1.1
 88148 RxHeader     c Accept-Encoding: deflate, gzip, x-gzip, identity,
 *;q=0
 88148 RxHeader     c X-Forwarded-For: yyy.yyy.yyy.yyy
 88148 RxHeader     c Connection: close
 88148 VCL_call     c recv
 88148 VCL_return   c lookup
 88148 VCL_call     c hash
 88148 VCL_return   c hash
 }}}


 thats it, no more log entries for 88148 in near future at least

 - question: is there a way to check the state of the stuck threads?

-- 
Ticket URL: <http://www.varnish-cache.org/trac/ticket/951>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator




More information about the varnish-bugs mailing list