Compared performance of Varnish Cache on x86_64 and aarch64

Guillaume Quintard guillaume at
Wed Aug 5 13:52:29 UTC 2020

I stand corrected, good to see it's not an issue.

On Wed, Aug 5, 2020, 02:17 Martin Grigorov <martin.grigorov at>

> Hi Guillaume,
> On Tue, Aug 4, 2020 at 5:47 PM Guillaume Quintard <
> guillaume at> wrote:
>> Hi,
>> > Varnish gives around 20% less throughput than the Golang HTTP server
>> but I guess this is because the Golang server is much simpler than Varnish.
>> Since the backend and vegeta are written in go, it's pretty safe they are
>> going to use H/2 by default, and that's not the case for your varnish
>> instance, so that possibly explain some of the differences you are seeing.
> To use H/2 one has to use -http2 parameter (
> In addition I'd need to start the HTTP server with
> svr.ListenAndServeTLS(cert, key)
> I've added "log.Printf("Protocol: %s", r.Proto)" to the handle function
> and it prints "HTTP/1.1" no matter whether I use -http2 parameter for
> Vegeta or not
>> Cheers,
>> --
>> Guillaume Quintard
>> On Tue, Aug 4, 2020 at 4:33 AM Martin Grigorov <martin.grigorov at>
>> wrote:
>>> Hi,
>>> I've updated the data in the article -
>>> Now x86_64 and aarch64 are almost the same!
>>> Varnish gives around 20% less throughput than the Golang HTTP server but
>>> I guess this is because the Golang server is much simpler than Varnish.
>>> 3 min run produces around 3GB of Vegeta reports (130MB gzipped). If
>>> anyone wants me to extract some extra data just let me know!
>>> Regards,
>>> Martin
>>> On Mon, Aug 3, 2020 at 6:14 PM Martin Grigorov <
>>> martin.grigorov at> wrote:
>>>> Hi,
>>>> Thank you all for the feedback!
>>>> After some debugging it appeared that it is a bug in wrk - most of the
>>>> requests' latencies were 0 in the raw reports.
>>>> I've looked for a better maintained HTTP load testing tool and I liked
>>>> it provides (correctly looking)
>>>> statistics, can measure latencies while using constant rate, and last but
>>>> not least can produce plot charts!
>>>> I will update my article and let you know once I'm done!
>>>> Regards,
>>>> Martin
>>>> On Fri, Jul 31, 2020 at 4:43 PM Pål Hermunn Johansen <
>>>> hermunn at> wrote:
>>>>> I am sorry for being so late to the game, but here it goes:
>>>>> ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <
>>>>> phk at>:
>>>>> > Your measurement says that there is 2/3 chance that the latency
>>>>> > is between:
>>>>> >
>>>>> >         655.40µs - 798.70µs     = -143.30µs
>>>>> >
>>>>> > and
>>>>> >         655.40µs + 798.70µs     = 1454.10µs
>>>>> No, it does not. There is no claim anywhere that the numbers are
>>>>> following a normal distribution or an approximation of it. Of course,
>>>>> the calculations you do demonstrate that the data is far from normally
>>>>> distributed (as expected).
>>>>> > You cannot conclude _anything_ from those numbers.
>>>>> There are two numbers, the average and the standard deviation, and
>>>>> they are calculated from the data, but the truth is hidden deeper in
>>>>> the data. By looking at the particular numbers, I agree completely
>>>>> that it is wrong to conclude that one is better than the other. I am
>>>>> not saying that the statements in the article are false, just that you
>>>>> do not have data to draw the conclusions.
>>>>> Furthermore I have to say that Geoff got things right (see below). As
>>>>> a mathematician, I have to say that statistics is hard, and trusting
>>>>> the output of wrk to draw conclusions is outright the wrong thing to
>>>>> do.
>>>>> In this case we have a luxury which you typically do not have: Data is
>>>>> essentially free. You can run many tests and you can run short or long
>>>>> tests with different parameters. A 30 second test is simply not enough
>>>>> for anything.
>>>>> As Geoff indicated, for each transaction you can extract many relevant
>>>>> values from varnishlog, with the status, hit/miss, time to first byte
>>>>> and time to last byte being the most obvious ones. They can be
>>>>> extracted and saved to a csv file by using varnishncsa with a custom
>>>>> format string, and you can use R (used it myself as a tool in my
>>>>> previous job - not a fan) to do statistical analysis on the data. The
>>>>> Student T suggestion from Geoff is a good idea, but just looking at
>>>>> one set of numbers without considering other factors is mathematically
>>>>> problematic.
>>>>> Anyway, some obvious questions then arise. For example:
>>>>> - How do the numbers between wrk and varnishlog/varnishncsa compare?
>>>>> Did wrk report a total number of transactions than varnish? If there
>>>>> is a discrepancy, then the errors might be because of some resource
>>>>> restraint (number of sockets or dropped syn packages?).
>>>>> - How does the average and maximum compare between varnish and wrk?
>>>>> - What is the CPU usage of the kernel, the benchmarking tool and the
>>>>> varnish processes in the tests?
>>>>> - What is the difference between the time to first byte and the time
>>>>> to last byte in Varnish for different object sizes?
>>>>> When Varnish writes to a socket, it hands bytes over to the kernel,
>>>>> and when the write call returns, we do not know how far the bytes have
>>>>> come, and how long it will take before they get to the final
>>>>> destination. The bytes may be in a kernel buffer, they might be on the
>>>>> network card, and they might be already received at the client's
>>>>> kernel, and they might have made it all into wrk (which may or may not
>>>>> have timestamped the response). Typically, depending on many things,
>>>>> Varnish will report faster times than what wrk, but since returning
>>>>> from the write call means that the calling thread must be rescheduled,
>>>>> it is even possible that wrk will see that some requests are faster
>>>>> than what Varnish reports. Running wrk2 with different speeds in a
>>>>> series of tests seems natural to me, so that you can observe when (and
>>>>> how) the system starts running into bottlenecks. Note that the
>>>>> bottleneck can just as well be in wrk2 itself or on the combined CPU
>>>>> usage of kernel + Varnish + wrk2.
>>>>> To complicate things even further: On your ARM vs. x64 tests, my guess
>>>>> is that both kernel parameters and parameters for the network are
>>>>> different, and the distributions probably have good reason to choose
>>>>> different values. It is very likely that these differences affect the
>>>>> performance of the systems in many ways, and that different tests will
>>>>> have different "optimal" tunings of kernel and network parameters.
>>>>> Sorry for rambling, but getting the statistics wrong is so easy. The
>>>>> question is very interesting, but if you want to draw conclusions, you
>>>>> should do the analysis, and (ideally) give access to the raw data in
>>>>> case anyone wants to have a look.
>>>>> Best,
>>>>> Pål
>>>>> fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <geoff at>:
>>>>> >
>>>>> > On 7/28/20 13:52, Martin Grigorov wrote:
>>>>> > >
>>>>> > > I've just posted an article [1] about comparing the performance of
>>>>> Varnish
>>>>> > > Cache on two similar
>>>>> > > machines - the main difference is the CPU architecture - x86_64 vs
>>>>> aarch64.
>>>>> > > It uses a specific use case - the backend service just returns a
>>>>> static
>>>>> > > content. The idea is
>>>>> > > to compare Varnish on the different architectures but also to
>>>>> compare
>>>>> > > Varnish against the backend HTTP server.
>>>>> > > What is interesting is that Varnish gives the same throughput as
>>>>> the
>>>>> > > backend server on x86_64 but on aarch64 it is around 30% slower
>>>>> than the
>>>>> > > backend.
>>>>> >
>>>>> > Does your test have an account of whether there were any errors in
>>>>> > backend fetches? Don't know if that explains anything, but with a
>>>>> > connect timeout of 10s and first byte timeout of 5m, any error would
>>>>> > have a considerable effect on the results of a 30 second test.
>>>>> >
>>>>> > The test tool output doesn't say anything I can see about error
>>>>> rates --
>>>>> > whether all responses had status 200, and if not, how many had which
>>>>> > other status. Ideally it should be all 200, otherwise the results may
>>>>> > not be valid.
>>>>> >
>>>>> > I agree with phk that a statistical analysis is needed for a robust
>>>>> > statement about differences between the two platforms. For that,
>>>>> you'd
>>>>> > need more than the summary stats shown in your blog post -- you need
>>>>> to
>>>>> > collect all of the response times. What I usually do is query Varnish
>>>>> > client request logs for Timestamp:Resp and save the number in the
>>>>> last
>>>>> > column.
>>>>> >
>>>>> > t.test() in R runs Student's t-test (me R fanboi).
>>>>> >
>>>>> >
>>>> _______________________________________________
>>> varnish-dev mailing list
>>> varnish-dev at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the varnish-dev mailing list