Compared performance of Varnish Cache on x86_64 and aarch64

Tue Aug 4 11:31:40 UTC 2020

Hi,

I've updated the data in the article -
https://medium.com/@martin.grigorov/compare-varnish-cache-performance-on-x86-64-and-aarch64-cpu-architectures-cef5ad5fee5f
Now x86_64 and aarch64 are almost the same!
Varnish gives around 20% less throughput than the Golang HTTP server but I
guess this is because the Golang server is much simpler than Varnish.

3 min run produces around 3GB of Vegeta reports (130MB gzipped). If anyone
wants me to extract some extra data just let me know!

Regards,
Martin

On Mon, Aug 3, 2020 at 6:14 PM Martin Grigorov <martin.grigorov at gmail.com>
wrote:

> Hi,
>
> Thank you all for the feedback!
> After some debugging it appeared that it is a bug in wrk - most of the
> requests' latencies were 0 in the raw reports.
>
> I've looked for a better maintained HTTP load testing tool and I liked
> https://github.com/tsenart/vegeta. it provides (correctly looking)
> statistics, can measure latencies while using constant rate, and last but
> not least can produce plot charts!
> I will update my article and let you know once I'm done!
>
> Regards,
> Martin
>
> On Fri, Jul 31, 2020 at 4:43 PM Pål Hermunn Johansen <
> hermunn at varnish-software.com> wrote:
>
>> I am sorry for being so late to the game, but here it goes:
>>
>> ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <phk at phk.freebsd.dk
>> >:
>> > Your measurement says that there is 2/3 chance that the latency
>> > is between:
>> >
>> >         655.40µs - 798.70µs     = -143.30µs
>> >
>> > and
>> >         655.40µs + 798.70µs     = 1454.10µs
>>
>> No, it does not. There is no claim anywhere that the numbers are
>> following a normal distribution or an approximation of it. Of course,
>> the calculations you do demonstrate that the data is far from normally
>> distributed (as expected).
>>
>> > You cannot conclude _anything_ from those numbers.
>>
>> There are two numbers, the average and the standard deviation, and
>> they are calculated from the data, but the truth is hidden deeper in
>> the data. By looking at the particular numbers, I agree completely
>> that it is wrong to conclude that one is better than the other. I am
>> not saying that the statements in the article are false, just that you
>> do not have data to draw the conclusions.
>>
>> Furthermore I have to say that Geoff got things right (see below). As
>> a mathematician, I have to say that statistics is hard, and trusting
>> the output of wrk to draw conclusions is outright the wrong thing to
>> do.
>>
>> In this case we have a luxury which you typically do not have: Data is
>> essentially free. You can run many tests and you can run short or long
>> tests with different parameters. A 30 second test is simply not enough
>> for anything.
>>
>> As Geoff indicated, for each transaction you can extract many relevant
>> values from varnishlog, with the status, hit/miss, time to first byte
>> and time to last byte being the most obvious ones. They can be
>> extracted and saved to a csv file by using varnishncsa with a custom
>> format string, and you can use R (used it myself as a tool in my
>> previous job - not a fan) to do statistical analysis on the data. The
>> Student T suggestion from Geoff is a good idea, but just looking at
>> one set of numbers without considering other factors is mathematically
>> problematic.
>>
>> Anyway, some obvious questions then arise. For example:
>> - How do the numbers between wrk and varnishlog/varnishncsa compare?
>> Did wrk report a total number of transactions than varnish? If there
>> is a discrepancy, then the errors might be because of some resource
>> restraint (number of sockets or dropped syn packages?).
>> - How does the average and maximum compare between varnish and wrk?
>> - What is the CPU usage of the kernel, the benchmarking tool and the
>> varnish processes in the tests?
>> - What is the difference between the time to first byte and the time
>> to last byte in Varnish for different object sizes?
>>
>> When Varnish writes to a socket, it hands bytes over to the kernel,
>> and when the write call returns, we do not know how far the bytes have
>> come, and how long it will take before they get to the final
>> destination. The bytes may be in a kernel buffer, they might be on the
>> network card, and they might be already received at the client's
>> kernel, and they might have made it all into wrk (which may or may not
>> have timestamped the response). Typically, depending on many things,
>> Varnish will report faster times than what wrk, but since returning
>> from the write call means that the calling thread must be rescheduled,
>> it is even possible that wrk will see that some requests are faster
>> than what Varnish reports. Running wrk2 with different speeds in a
>> series of tests seems natural to me, so that you can observe when (and
>> how) the system starts running into bottlenecks. Note that the
>> bottleneck can just as well be in wrk2 itself or on the combined CPU
>> usage of kernel + Varnish + wrk2.
>>
>> To complicate things even further: On your ARM vs. x64 tests, my guess
>> is that both kernel parameters and parameters for the network are
>> different, and the distributions probably have good reason to choose
>> different values. It is very likely that these differences affect the
>> performance of the systems in many ways, and that different tests will
>> have different "optimal" tunings of kernel and network parameters.
>>
>> Sorry for rambling, but getting the statistics wrong is so easy. The
>> question is very interesting, but if you want to draw conclusions, you
>> should do the analysis, and (ideally) give access to the raw data in
>> case anyone wants to have a look.
>>
>> Best,
>> Pål
>>
>> fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <geoff at uplex.de>:
>> >
>> > On 7/28/20 13:52, Martin Grigorov wrote:
>> > >
>> > > I've just posted an article [1] about comparing the performance of
>> Varnish
>> > > Cache on two similar
>> > > machines - the main difference is the CPU architecture - x86_64 vs
>> aarch64.
>> > > It uses a specific use case - the backend service just returns a
>> static
>> > > content. The idea is
>> > > to compare Varnish on the different architectures but also to compare
>> > > Varnish against the backend HTTP server.
>> > > What is interesting is that Varnish gives the same throughput as the
>> > > backend server on x86_64 but on aarch64 it is around 30% slower than
>> the
>> > > backend.
>> >
>> > Does your test have an account of whether there were any errors in
>> > backend fetches? Don't know if that explains anything, but with a
>> > connect timeout of 10s and first byte timeout of 5m, any error would
>> > have a considerable effect on the results of a 30 second test.
>> >
>> > The test tool output doesn't say anything I can see about error rates --
>> > whether all responses had status 200, and if not, how many had which
>> > other status. Ideally it should be all 200, otherwise the results may
>> > not be valid.
>> >
>> > I agree with phk that a statistical analysis is needed for a robust
>> > statement about differences between the two platforms. For that, you'd
>> > need more than the summary stats shown in your blog post -- you need to
>> > collect all of the response times. What I usually do is query Varnish
>> > client request logs for Timestamp:Resp and save the number in the last
>> > column.
>> >
>> > t.test() in R runs Student's t-test (me R fanboi).
>> >
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20200804/110a59ef/attachment-0001.html>