Compared performance of Varnish Cache on x86_64 and aarch64

Martin Grigorov martin.grigorov at
Tue Aug 4 11:31:40 UTC 2020


I've updated the data in the article -
Now x86_64 and aarch64 are almost the same!
Varnish gives around 20% less throughput than the Golang HTTP server but I
guess this is because the Golang server is much simpler than Varnish.

3 min run produces around 3GB of Vegeta reports (130MB gzipped). If anyone
wants me to extract some extra data just let me know!


On Mon, Aug 3, 2020 at 6:14 PM Martin Grigorov <martin.grigorov at>

> Hi,
> Thank you all for the feedback!
> After some debugging it appeared that it is a bug in wrk - most of the
> requests' latencies were 0 in the raw reports.
> I've looked for a better maintained HTTP load testing tool and I liked
> it provides (correctly looking)
> statistics, can measure latencies while using constant rate, and last but
> not least can produce plot charts!
> I will update my article and let you know once I'm done!
> Regards,
> Martin
> On Fri, Jul 31, 2020 at 4:43 PM Pål Hermunn Johansen <
> hermunn at> wrote:
>> I am sorry for being so late to the game, but here it goes:
>> ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <phk at
>> >:
>> > Your measurement says that there is 2/3 chance that the latency
>> > is between:
>> >
>> >         655.40µs - 798.70µs     = -143.30µs
>> >
>> > and
>> >         655.40µs + 798.70µs     = 1454.10µs
>> No, it does not. There is no claim anywhere that the numbers are
>> following a normal distribution or an approximation of it. Of course,
>> the calculations you do demonstrate that the data is far from normally
>> distributed (as expected).
>> > You cannot conclude _anything_ from those numbers.
>> There are two numbers, the average and the standard deviation, and
>> they are calculated from the data, but the truth is hidden deeper in
>> the data. By looking at the particular numbers, I agree completely
>> that it is wrong to conclude that one is better than the other. I am
>> not saying that the statements in the article are false, just that you
>> do not have data to draw the conclusions.
>> Furthermore I have to say that Geoff got things right (see below). As
>> a mathematician, I have to say that statistics is hard, and trusting
>> the output of wrk to draw conclusions is outright the wrong thing to
>> do.
>> In this case we have a luxury which you typically do not have: Data is
>> essentially free. You can run many tests and you can run short or long
>> tests with different parameters. A 30 second test is simply not enough
>> for anything.
>> As Geoff indicated, for each transaction you can extract many relevant
>> values from varnishlog, with the status, hit/miss, time to first byte
>> and time to last byte being the most obvious ones. They can be
>> extracted and saved to a csv file by using varnishncsa with a custom
>> format string, and you can use R (used it myself as a tool in my
>> previous job - not a fan) to do statistical analysis on the data. The
>> Student T suggestion from Geoff is a good idea, but just looking at
>> one set of numbers without considering other factors is mathematically
>> problematic.
>> Anyway, some obvious questions then arise. For example:
>> - How do the numbers between wrk and varnishlog/varnishncsa compare?
>> Did wrk report a total number of transactions than varnish? If there
>> is a discrepancy, then the errors might be because of some resource
>> restraint (number of sockets or dropped syn packages?).
>> - How does the average and maximum compare between varnish and wrk?
>> - What is the CPU usage of the kernel, the benchmarking tool and the
>> varnish processes in the tests?
>> - What is the difference between the time to first byte and the time
>> to last byte in Varnish for different object sizes?
>> When Varnish writes to a socket, it hands bytes over to the kernel,
>> and when the write call returns, we do not know how far the bytes have
>> come, and how long it will take before they get to the final
>> destination. The bytes may be in a kernel buffer, they might be on the
>> network card, and they might be already received at the client's
>> kernel, and they might have made it all into wrk (which may or may not
>> have timestamped the response). Typically, depending on many things,
>> Varnish will report faster times than what wrk, but since returning
>> from the write call means that the calling thread must be rescheduled,
>> it is even possible that wrk will see that some requests are faster
>> than what Varnish reports. Running wrk2 with different speeds in a
>> series of tests seems natural to me, so that you can observe when (and
>> how) the system starts running into bottlenecks. Note that the
>> bottleneck can just as well be in wrk2 itself or on the combined CPU
>> usage of kernel + Varnish + wrk2.
>> To complicate things even further: On your ARM vs. x64 tests, my guess
>> is that both kernel parameters and parameters for the network are
>> different, and the distributions probably have good reason to choose
>> different values. It is very likely that these differences affect the
>> performance of the systems in many ways, and that different tests will
>> have different "optimal" tunings of kernel and network parameters.
>> Sorry for rambling, but getting the statistics wrong is so easy. The
>> question is very interesting, but if you want to draw conclusions, you
>> should do the analysis, and (ideally) give access to the raw data in
>> case anyone wants to have a look.
>> Best,
>> Pål
>> fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <geoff at>:
>> >
>> > On 7/28/20 13:52, Martin Grigorov wrote:
>> > >
>> > > I've just posted an article [1] about comparing the performance of
>> Varnish
>> > > Cache on two similar
>> > > machines - the main difference is the CPU architecture - x86_64 vs
>> aarch64.
>> > > It uses a specific use case - the backend service just returns a
>> static
>> > > content. The idea is
>> > > to compare Varnish on the different architectures but also to compare
>> > > Varnish against the backend HTTP server.
>> > > What is interesting is that Varnish gives the same throughput as the
>> > > backend server on x86_64 but on aarch64 it is around 30% slower than
>> the
>> > > backend.
>> >
>> > Does your test have an account of whether there were any errors in
>> > backend fetches? Don't know if that explains anything, but with a
>> > connect timeout of 10s and first byte timeout of 5m, any error would
>> > have a considerable effect on the results of a 30 second test.
>> >
>> > The test tool output doesn't say anything I can see about error rates --
>> > whether all responses had status 200, and if not, how many had which
>> > other status. Ideally it should be all 200, otherwise the results may
>> > not be valid.
>> >
>> > I agree with phk that a statistical analysis is needed for a robust
>> > statement about differences between the two platforms. For that, you'd
>> > need more than the summary stats shown in your blog post -- you need to
>> > collect all of the response times. What I usually do is query Varnish
>> > client request logs for Timestamp:Resp and save the number in the last
>> > column.
>> >
>> > t.test() in R runs Student's t-test (me R fanboi).
>> >
>> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the varnish-dev mailing list