Under Load: Server Unavailable/Connection Dropped/Delayed Reponse

Tejaswi Nadahalli nadahalli at gmail.com
Fri Mar 4 22:01:42 CET 2011

According to http://www.spinics.net/lists/linux-net/msg17545.html - it might
be due to "Overflowing the listen() command's incoming connection backlog."

I simulated my load again, and here're the listen status before and during
the test.

    3689345 times the listen queue of a socket overflowed
    3689345 SYNs to LISTEN sockets dropped

    3690354 times the listen queue of a socket overflowed
    3690354 SYNs to LISTEN sockets dropped

My net.core.somaxconn = 262144, which is pretty high. So, I cannot see what
else I can do to increase the backlog's length.

Is the only way to add more Varnish servers and load balance them behind
Nginx or some such?


On Fri, Mar 4, 2011 at 3:19 PM, Tejaswi Nadahalli <nadahalli at gmail.com>wrote:

> Under loaded conditions (3 machines doing httperf separately), I did a
> separate wget on the side, and am attaching the TCPDUMP of that request. As
> you can see, there is a delay in the middle where varnish didn't respond
> immediately. If thread/hit-rate conditions are optimal, this delay should be
> minimal I thought.
> Any help would be appreciated.
> -T
> On Fri, Mar 4, 2011 at 2:30 PM, Tejaswi Nadahalli <nadahalli at gmail.com>wrote:
>> On Fri, Mar 4, 2011 at 2:25 PM, Caunter, Stefan <scaunter at topscms.com>wrote:
>>> There’s no health check in the backend. Not sure what that does with a
>>> one hour grace. I set a short grace with
>>>   if (req.backend.healthy) {
>>>                 set req.grace = 60s;
>>>         } else {
>>>                 set req.grace = 4h;
>>>         }
>> I am still to add health-checks, directors, etc. Will add them soon. But
>> those make sense if the cache-primed performance is good. In my test, I am
>> requesting URLs who I know are already in the cache. Varnishstat also shows
>> that - there are no cache misses at all.
>>> You also don’t appear to select a backend in recv.
>> The default backend seems to be getting picked up automatically.
>> -T
>>> Stefan Caunter
>>> Operations
>>> Torstar Digital
>>> m: (416) 561-4871
>>> *From:* varnish-misc-bounces at varnish-cache.org [mailto:
>>> varnish-misc-bounces at varnish-cache.org] *On Behalf Of *Tejaswi Nadahalli
>>> *Sent:* March-04-11 1:23 PM
>>> *To:* varnish-misc at varnish-cache.org
>>> *Subject:* Re: Under Load: Server Unavailable/Connection Dropped/Delayed
>>> Reponse
>>> On Fri, Mar 4, 2011 at 9:43 AM, Caunter, Stefan <scaunter at topscms.com>
>>> wrote:
>>> What does something like firebug show when you request during the load
>>> test? The delay may be anything from DNS to the ec2 network.
>>> The DNS requests are getting resolved super quick. I am unable to see any
>>> other network issues with EC2. I have a similar machine in the same data
>>> center running nginx which is doing similar loads, but with no caching
>>> requirement, and it's running fine.
>>> In my first post, I forgot to attach my VCL, which is a bit too minimal.
>>> Am I missing something obvious?
>>> ------
>>> backend default0 {
>>>     .host = "";
>>>     .port = "8000";
>>> }
>>> sub vcl_recv {
>>>     unset req.http.Cookie;
>>>     set req.grace = 3600s;
>>>     set req.url = regsub(req.url, "&refurl=.*&t=.*&c=.*&r=.*", "");
>>> }
>>> sub vcl_deliver {
>>>   if (obj.hits > 0) {
>>>     set resp.http.X-Cache = "HIT";
>>>   } else {
>>>     set resp.http.X-Cache = "MISS";
>>>   }
>>> }
>>> -------------------------
>>> Could there be some kind of TCP packet pileup that I am missing?
>>> -T
>>> Stefan Caunter
>>> Operations
>>> Torstar Digital
>>> m: (416) 561-4871
>>> *From:* varnish-misc-bounces at varnish-cache.org [mailto:
>>> varnish-misc-bounces at varnish-cache.org] *On Behalf Of *Tejaswi Nadahalli
>>> *Sent:* March-04-11 1:09 AM
>>> *To:* varnish-misc at varnish-cache.org
>>> *Subject:* Under Load: Server Unavailable/Connection Dropped/Delayed
>>> Reponse
>>> Hi Everyone,
>>> I am seeing a situation similar to :
>>> http://www.varnish-cache.org/lists/pipermail/varnish-misc/2011-January/005351.html(Connections Dropped Under Load)
>>> http://www.varnish-cache.org/lists/pipermail/varnish-misc/2010-December/005258.html(Hanging Connections)
>>> I have httperf loading a varnish cache with never-expire content. While
>>> the load is on, other browser/wget requests to the varnish server get
>>> delayed to 10+ seconds. Any ideas what could be happening? ssh doesn't seem
>>> to be impacted. So, is it some kind of thread problem?
>>> In production, I see a similar situation with around 1000 req/second
>>> load.
>>> I am running varnishd with the following command line options (as per
>>> http://kristianlyng.wordpress.com/2009/10/19/high-end-varnish-tuning/):
>>> sudo varnishd -f /etc/varnish/default.vcl -s malloc,5G -T
>>> -p thread_pools=8 -p thread_pool_min=100 -p
>>> thread_pool_max=5000 -p thread_pool_add_delay=2 -p cli_timeout=25 -p
>>> session_linger=100 -p lru_interval=20 -t 31536000
>>> I am on Ubuntu Lucid 64 bit Amazon EC2 C1.XLarge with 8 processing units.
>>> My network sysctl parameters are tuned according to:
>>> http://varnish-cache.org/trac/wiki/Performance
>>> fs.file-max = 360000
>>> net.ipv4.ip_local_port_range = 1024 65536
>>> net.core.rmem_max = 16777216
>>> net.core.wmem_max = 16777216
>>> net.ipv4.tcp_rmem = 4096 87380 16777216
>>> net.ipv4.tcp_wmem = 4096 65536 16777216
>>> net.ipv4.tcp_fin_timeout = 3
>>> net.core.netdev_max_backlog = 30000
>>> net.ipv4.tcp_no_metrics_save = 1
>>> net.core.somaxconn = 262144
>>> net.ipv4.tcp_syncookies = 0
>>> net.ipv4.tcp_max_orphans = 262144
>>> net.ipv4.tcp_max_syn_backlog = 262144
>>> net.ipv4.tcp_synack_retries = 2
>>> net.ipv4.tcp_syn_retries = 2
>>> Any help would be greatly appreciated
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20110304/92da828b/attachment.html>

More information about the varnish-misc mailing list