Under Load: Server Unavailable/Connection Dropped/Delayed Reponse

Tejaswi Nadahalli nadahalli at gmail.com
Fri Mar 4 22:01:42 CET 2011


According to http://www.spinics.net/lists/linux-net/msg17545.html - it might
be due to "Overflowing the listen() command's incoming connection backlog."

I simulated my load again, and here're the listen status before and during
the test.

Before:
    3689345 times the listen queue of a socket overflowed
    3689345 SYNs to LISTEN sockets dropped

During:
    3690354 times the listen queue of a socket overflowed
    3690354 SYNs to LISTEN sockets dropped

My net.core.somaxconn = 262144, which is pretty high. So, I cannot see what
else I can do to increase the backlog's length.

Is the only way to add more Varnish servers and load balance them behind
Nginx or some such?

-T

On Fri, Mar 4, 2011 at 3:19 PM, Tejaswi Nadahalli <nadahalli at gmail.com>wrote:

> Under loaded conditions (3 machines doing httperf separately), I did a
> separate wget on the side, and am attaching the TCPDUMP of that request. As
> you can see, there is a delay in the middle where varnish didn't respond
> immediately. If thread/hit-rate conditions are optimal, this delay should be
> minimal I thought.
>
> Any help would be appreciated.
>
> -T
>
>
> On Fri, Mar 4, 2011 at 2:30 PM, Tejaswi Nadahalli <nadahalli at gmail.com>wrote:
>
>> On Fri, Mar 4, 2011 at 2:25 PM, Caunter, Stefan <scaunter at topscms.com>wrote:
>>
>>> There’s no health check in the backend. Not sure what that does with a
>>> one hour grace. I set a short grace with
>>>
>>>
>>>
>>>   if (req.backend.healthy) {
>>>
>>>                 set req.grace = 60s;
>>>
>>>         } else {
>>>
>>>                 set req.grace = 4h;
>>>
>>>         }
>>>
>>
>> I am still to add health-checks, directors, etc. Will add them soon. But
>> those make sense if the cache-primed performance is good. In my test, I am
>> requesting URLs who I know are already in the cache. Varnishstat also shows
>> that - there are no cache misses at all.
>>
>>
>>>
>>>
>>> You also don’t appear to select a backend in recv.
>>>
>>
>> The default backend seems to be getting picked up automatically.
>>
>> -T
>>
>>
>>>
>>>
>>> Stefan Caunter
>>>
>>> Operations
>>>
>>> Torstar Digital
>>>
>>> m: (416) 561-4871
>>>
>>>
>>>
>>>
>>>
>>> *From:* varnish-misc-bounces at varnish-cache.org [mailto:
>>> varnish-misc-bounces at varnish-cache.org] *On Behalf Of *Tejaswi Nadahalli
>>> *Sent:* March-04-11 1:23 PM
>>>
>>> *To:* varnish-misc at varnish-cache.org
>>> *Subject:* Re: Under Load: Server Unavailable/Connection Dropped/Delayed
>>> Reponse
>>>
>>>
>>>
>>> On Fri, Mar 4, 2011 at 9:43 AM, Caunter, Stefan <scaunter at topscms.com>
>>> wrote:
>>>
>>>
>>>
>>> What does something like firebug show when you request during the load
>>> test? The delay may be anything from DNS to the ec2 network.
>>>
>>>
>>> The DNS requests are getting resolved super quick. I am unable to see any
>>> other network issues with EC2. I have a similar machine in the same data
>>> center running nginx which is doing similar loads, but with no caching
>>> requirement, and it's running fine.
>>>
>>> In my first post, I forgot to attach my VCL, which is a bit too minimal.
>>> Am I missing something obvious?
>>>
>>> ------
>>> backend default0 {
>>>     .host = "10.202.30.39";
>>>     .port = "8000";
>>> }
>>>
>>> sub vcl_recv {
>>>     unset req.http.Cookie;
>>>     set req.grace = 3600s;
>>>     set req.url = regsub(req.url, "&refurl=.*&t=.*&c=.*&r=.*", "");
>>> }
>>>
>>> sub vcl_deliver {
>>>   if (obj.hits > 0) {
>>>     set resp.http.X-Cache = "HIT";
>>>   } else {
>>>     set resp.http.X-Cache = "MISS";
>>>   }
>>> }
>>> -------------------------
>>>
>>> Could there be some kind of TCP packet pileup that I am missing?
>>>
>>> -T
>>>
>>>
>>>
>>>
>>> Stefan Caunter
>>>
>>> Operations
>>>
>>> Torstar Digital
>>>
>>> m: (416) 561-4871
>>>
>>>
>>>
>>>
>>>
>>> *From:* varnish-misc-bounces at varnish-cache.org [mailto:
>>> varnish-misc-bounces at varnish-cache.org] *On Behalf Of *Tejaswi Nadahalli
>>> *Sent:* March-04-11 1:09 AM
>>> *To:* varnish-misc at varnish-cache.org
>>> *Subject:* Under Load: Server Unavailable/Connection Dropped/Delayed
>>> Reponse
>>>
>>>
>>>
>>> Hi Everyone,
>>>
>>> I am seeing a situation similar to :
>>>
>>>
>>> http://www.varnish-cache.org/lists/pipermail/varnish-misc/2011-January/005351.html(Connections Dropped Under Load)
>>>
>>> http://www.varnish-cache.org/lists/pipermail/varnish-misc/2010-December/005258.html(Hanging Connections)
>>>
>>> I have httperf loading a varnish cache with never-expire content. While
>>> the load is on, other browser/wget requests to the varnish server get
>>> delayed to 10+ seconds. Any ideas what could be happening? ssh doesn't seem
>>> to be impacted. So, is it some kind of thread problem?
>>>
>>> In production, I see a similar situation with around 1000 req/second
>>> load.
>>>
>>> I am running varnishd with the following command line options (as per
>>> http://kristianlyng.wordpress.com/2009/10/19/high-end-varnish-tuning/):
>>>
>>> sudo varnishd -f /etc/varnish/default.vcl -s malloc,5G -T 127.0.0.1:2000-a
>>> 0.0.0.0:80 -p thread_pools=8 -p thread_pool_min=100 -p
>>> thread_pool_max=5000 -p thread_pool_add_delay=2 -p cli_timeout=25 -p
>>> session_linger=100 -p lru_interval=20 -t 31536000
>>>
>>> I am on Ubuntu Lucid 64 bit Amazon EC2 C1.XLarge with 8 processing units.
>>>
>>> My network sysctl parameters are tuned according to:
>>> http://varnish-cache.org/trac/wiki/Performance
>>> fs.file-max = 360000
>>> net.ipv4.ip_local_port_range = 1024 65536
>>> net.core.rmem_max = 16777216
>>> net.core.wmem_max = 16777216
>>> net.ipv4.tcp_rmem = 4096 87380 16777216
>>> net.ipv4.tcp_wmem = 4096 65536 16777216
>>> net.ipv4.tcp_fin_timeout = 3
>>> net.core.netdev_max_backlog = 30000
>>> net.ipv4.tcp_no_metrics_save = 1
>>> net.core.somaxconn = 262144
>>> net.ipv4.tcp_syncookies = 0
>>> net.ipv4.tcp_max_orphans = 262144
>>> net.ipv4.tcp_max_syn_backlog = 262144
>>> net.ipv4.tcp_synack_retries = 2
>>> net.ipv4.tcp_syn_retries = 2
>>>
>>>
>>> Any help would be greatly appreciated
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20110304/92da828b/attachment-0003.html>


More information about the varnish-misc mailing list