Compared performance of Varnish Cache on x86_64 and aarch64

Guillaume Quintard guillaume at
Tue Aug 4 14:46:51 UTC 2020


> Varnish gives around 20% less throughput than the Golang HTTP server but
I guess this is because the Golang server is much simpler than Varnish.

Since the backend and vegeta are written in go, it's pretty safe they are
going to use H/2 by default, and that's not the case for your varnish
instance, so that possibly explain some of the differences you are seeing.


Guillaume Quintard

On Tue, Aug 4, 2020 at 4:33 AM Martin Grigorov <martin.grigorov at>

> Hi,
> I've updated the data in the article -
> Now x86_64 and aarch64 are almost the same!
> Varnish gives around 20% less throughput than the Golang HTTP server but I
> guess this is because the Golang server is much simpler than Varnish.
> 3 min run produces around 3GB of Vegeta reports (130MB gzipped). If anyone
> wants me to extract some extra data just let me know!
> Regards,
> Martin
> On Mon, Aug 3, 2020 at 6:14 PM Martin Grigorov <martin.grigorov at>
> wrote:
>> Hi,
>> Thank you all for the feedback!
>> After some debugging it appeared that it is a bug in wrk - most of the
>> requests' latencies were 0 in the raw reports.
>> I've looked for a better maintained HTTP load testing tool and I liked
>> it provides (correctly looking)
>> statistics, can measure latencies while using constant rate, and last but
>> not least can produce plot charts!
>> I will update my article and let you know once I'm done!
>> Regards,
>> Martin
>> On Fri, Jul 31, 2020 at 4:43 PM Pål Hermunn Johansen <
>> hermunn at> wrote:
>>> I am sorry for being so late to the game, but here it goes:
>>> ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <phk at
>>> >:
>>> > Your measurement says that there is 2/3 chance that the latency
>>> > is between:
>>> >
>>> >         655.40µs - 798.70µs     = -143.30µs
>>> >
>>> > and
>>> >         655.40µs + 798.70µs     = 1454.10µs
>>> No, it does not. There is no claim anywhere that the numbers are
>>> following a normal distribution or an approximation of it. Of course,
>>> the calculations you do demonstrate that the data is far from normally
>>> distributed (as expected).
>>> > You cannot conclude _anything_ from those numbers.
>>> There are two numbers, the average and the standard deviation, and
>>> they are calculated from the data, but the truth is hidden deeper in
>>> the data. By looking at the particular numbers, I agree completely
>>> that it is wrong to conclude that one is better than the other. I am
>>> not saying that the statements in the article are false, just that you
>>> do not have data to draw the conclusions.
>>> Furthermore I have to say that Geoff got things right (see below). As
>>> a mathematician, I have to say that statistics is hard, and trusting
>>> the output of wrk to draw conclusions is outright the wrong thing to
>>> do.
>>> In this case we have a luxury which you typically do not have: Data is
>>> essentially free. You can run many tests and you can run short or long
>>> tests with different parameters. A 30 second test is simply not enough
>>> for anything.
>>> As Geoff indicated, for each transaction you can extract many relevant
>>> values from varnishlog, with the status, hit/miss, time to first byte
>>> and time to last byte being the most obvious ones. They can be
>>> extracted and saved to a csv file by using varnishncsa with a custom
>>> format string, and you can use R (used it myself as a tool in my
>>> previous job - not a fan) to do statistical analysis on the data. The
>>> Student T suggestion from Geoff is a good idea, but just looking at
>>> one set of numbers without considering other factors is mathematically
>>> problematic.
>>> Anyway, some obvious questions then arise. For example:
>>> - How do the numbers between wrk and varnishlog/varnishncsa compare?
>>> Did wrk report a total number of transactions than varnish? If there
>>> is a discrepancy, then the errors might be because of some resource
>>> restraint (number of sockets or dropped syn packages?).
>>> - How does the average and maximum compare between varnish and wrk?
>>> - What is the CPU usage of the kernel, the benchmarking tool and the
>>> varnish processes in the tests?
>>> - What is the difference between the time to first byte and the time
>>> to last byte in Varnish for different object sizes?
>>> When Varnish writes to a socket, it hands bytes over to the kernel,
>>> and when the write call returns, we do not know how far the bytes have
>>> come, and how long it will take before they get to the final
>>> destination. The bytes may be in a kernel buffer, they might be on the
>>> network card, and they might be already received at the client's
>>> kernel, and they might have made it all into wrk (which may or may not
>>> have timestamped the response). Typically, depending on many things,
>>> Varnish will report faster times than what wrk, but since returning
>>> from the write call means that the calling thread must be rescheduled,
>>> it is even possible that wrk will see that some requests are faster
>>> than what Varnish reports. Running wrk2 with different speeds in a
>>> series of tests seems natural to me, so that you can observe when (and
>>> how) the system starts running into bottlenecks. Note that the
>>> bottleneck can just as well be in wrk2 itself or on the combined CPU
>>> usage of kernel + Varnish + wrk2.
>>> To complicate things even further: On your ARM vs. x64 tests, my guess
>>> is that both kernel parameters and parameters for the network are
>>> different, and the distributions probably have good reason to choose
>>> different values. It is very likely that these differences affect the
>>> performance of the systems in many ways, and that different tests will
>>> have different "optimal" tunings of kernel and network parameters.
>>> Sorry for rambling, but getting the statistics wrong is so easy. The
>>> question is very interesting, but if you want to draw conclusions, you
>>> should do the analysis, and (ideally) give access to the raw data in
>>> case anyone wants to have a look.
>>> Best,
>>> Pål
>>> fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <geoff at>:
>>> >
>>> > On 7/28/20 13:52, Martin Grigorov wrote:
>>> > >
>>> > > I've just posted an article [1] about comparing the performance of
>>> Varnish
>>> > > Cache on two similar
>>> > > machines - the main difference is the CPU architecture - x86_64 vs
>>> aarch64.
>>> > > It uses a specific use case - the backend service just returns a
>>> static
>>> > > content. The idea is
>>> > > to compare Varnish on the different architectures but also to compare
>>> > > Varnish against the backend HTTP server.
>>> > > What is interesting is that Varnish gives the same throughput as the
>>> > > backend server on x86_64 but on aarch64 it is around 30% slower than
>>> the
>>> > > backend.
>>> >
>>> > Does your test have an account of whether there were any errors in
>>> > backend fetches? Don't know if that explains anything, but with a
>>> > connect timeout of 10s and first byte timeout of 5m, any error would
>>> > have a considerable effect on the results of a 30 second test.
>>> >
>>> > The test tool output doesn't say anything I can see about error rates
>>> --
>>> > whether all responses had status 200, and if not, how many had which
>>> > other status. Ideally it should be all 200, otherwise the results may
>>> > not be valid.
>>> >
>>> > I agree with phk that a statistical analysis is needed for a robust
>>> > statement about differences between the two platforms. For that, you'd
>>> > need more than the summary stats shown in your blog post -- you need to
>>> > collect all of the response times. What I usually do is query Varnish
>>> > client request logs for Timestamp:Resp and save the number in the last
>>> > column.
>>> >
>>> > t.test() in R runs Student's t-test (me R fanboi).
>>> >
>>> >
>> _______________________________________________
> varnish-dev mailing list
> varnish-dev at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the varnish-dev mailing list