Varnish and TCP Incast Throughput Collapse

Guillaume Quintard guillaume at varnish-software.com
Fri Jul 7 09:10:19 CEST 2017


I'm having trouble understanding the concept of readahead in an HTTP
context.

You are using the malloc cache storage, right?

-- 
Guillaume Quintard

On Thu, Jul 6, 2017 at 7:15 PM, John Salmon <John.Salmon at deshawresearch.com>
wrote:

> Thanks for your suggestions.
>
> One more detail I didn't mention:   Roughly speaking, the client is doing
> "read ahead", but it only reads ahead by a limited amount (about 4 blocks,
> each of 128KiB).  The surprising behavior is that when four readahead
> threads are allowed to run concurrently their aggregate throughput is much
> lower than when all the readaheads are serialized through a single thread.
>
> Traces (with strace and/or tcpdump) show frequent stalls of roughly 200ms
> where nothing seems to move across the channel and all client-side system
> calls are waiting.  200ms is suspiciously close to the linux 'rto_min'
> parameter, which was the first thing that led me to suspect TCP incast
> collapse.  We get some improvement by reducing rto_min on the server, and
> we also get some improvement by reducing SO_RCVBUF in the client.  But as I
> said, both have tradeoffs, so I'm interested if anyone else has encountered
> or overcome this particular problem.
>
> I do not see the dropoff from single-thread to multi-thread when I client
> and server on the same host.  I.e., I get around 500MB/s with one client
> and roughly the same total bandwidth with multiple clients.  I'm sure that
> with some tuning, the 500MB/s could be improved, but that's not the issue
> here.
>
> Here are the ethtool reports:
>
> On the client:
> drdws0134$ ethtool eth0
> Settings for eth0:
>     Supported ports: [ TP ]
>     Supported link modes:   10baseT/Half 10baseT/Full
>                             100baseT/Half 100baseT/Full
>                             1000baseT/Full
>     Supported pause frame use: No
>     Supports auto-negotiation: Yes
>     Advertised link modes:  10baseT/Half 10baseT/Full
>                             100baseT/Half 100baseT/Full
>                             1000baseT/Full
>     Advertised pause frame use: No
>     Advertised auto-negotiation: Yes
>     Speed: 1000Mb/s
>     Duplex: Full
>     Port: Twisted Pair
>     PHYAD: 1
>     Transceiver: internal
>     Auto-negotiation: on
>     MDI-X: on (auto)
> Cannot get wake-on-lan settings: Operation not permitted
>     Current message level: 0x00000007 (7)
>                    drv probe link
>     Link detected: yes
> drdws0134$
>
> On the server:
>
> $ ethtool eth0
> Settings for eth0:
>     Supported ports: [ TP ]
>     Supported link modes:   1000baseT/Full
>                             10000baseT/Full
>     Supported pause frame use: No
>     Supports auto-negotiation: No
>     Advertised link modes:  Not reported
>     Advertised pause frame use: No
>     Advertised auto-negotiation: No
>     Speed: 10000Mb/s
>     Duplex: Full
>     Port: Twisted Pair
>     PHYAD: 0
>     Transceiver: internal
>     Auto-negotiation: off
>     MDI-X: Unknown
> Cannot get wake-on-lan settings: Operation not permitted
> Cannot get link status: Operation not permitted
> $
>
>
> On 07/06/2017 03:08 AM, Guillaume Quintard wrote:
>
> Two things: do you get the same results when the client is directly on the
> Varnish server? (ie. not going through the switch) And is each new request
> opening a new connection?
>
> --
> Guillaume Quintard
>
> On Thu, Jul 6, 2017 at 6:45 AM, Andrei <lagged at gmail.com> wrote:
>
>> Out of curiosity, what does ethtool show for the related nics on both
>> servers? I also have Varnish on a 10G server, and can reach around
>> 7.7Gbit/s serving anywhere between 6-28k requests/second, however it did
>> take some sysctl tuning and the westwood TCP congestion control algo
>>
>> On Wed, Jul 5, 2017 at 3:09 PM, John Salmon <
>> John.Salmon at deshawresearch.com> wrote:
>>
>>> I've been using Varnish in an "intranet" application.  The picture is
>>> roughly:
>>>
>>>   origin <-> Varnish <-- 10G channel ---> switch <-- 1G channel -->
>>> client
>>>
>>> The machine running Varnish is a high-performance server.  It can
>>> easily saturate a 10Gbit channel.  The machine running the client is a
>>> more modest desktop workstation, but it's fully capable of saturating
>>> a 1Gbit channel.
>>>
>>> The client makes HTTP requests for objects of size 128kB.
>>>
>>> When the client makes those requests serially, "useful" data is
>>> transferred at about 80% of the channel bandwidth of the Gigabit
>>> link, which seems perfectly reasonable.
>>>
>>> But when the client makes the requests in parallel (typically
>>> 4-at-a-time, but it can vary), *total* throughput drops to about 25%
>>> of the channel bandwidth, i.e., about 30Mbyte/sec.
>>>
>>> After looking at traces and doing a fair amount of experimentation, we
>>> have reached the tentative conclusion that we're seeing "TCP Incast
>>> Throughput Collapse" (see references below)
>>>
>>> The literature on "TCP Incast Throughput Collapse" typically describes
>>> scenarios where a large number of servers overwhelm a single inbound
>>> port.  I haven't found any discussion of incast collapse with only one
>>> server, but it seems like a natural consequence of a 10Gigabit-capable
>>> server feeding a 1-Gigabit downlink.
>>>
>>> Has anybody else seen anything similar?  With Varnish or other single
>>> servers on 10Gbit to 1Gbit links.
>>>
>>> The literature offers a variety of mitigation strategies, but there are
>>> non-trivial tradeoffs and none appears to be a silver bullet.
>>>
>>> If anyone has seen TCP Incast Collapse with Varnish, were you able to
>>> work
>>> around it, and if so, how?
>>>
>>> Thanks,
>>> John Salmon
>>>
>>> References:
>>>
>>> http://www.pdl.cmu.edu/Incast/
>>>
>>> Annotated Bibliography in:
>>>    https://lists.freebsd.org/pipermail/freebsd-net/2015-Novembe
>>> r/043926.html
>>>
>>> --
>>> *.*
>>>
>>> _______________________________________________
>>> varnish-misc mailing list
>>> varnish-misc at varnish-cache.org
>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>
>>
>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
>
> --
> *.*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20170707/f8686301/attachment-0001.html>


More information about the varnish-misc mailing list