Compared performance of Varnish Cache on x86_64 and aarch64

Martin Grigorov martin.grigorov at
Wed Jul 29 12:03:07 UTC 2020

Hi Poul-Henning,

Thank you for your answer!

On Tue, Jul 28, 2020 at 5:01 PM Poul-Henning Kamp <phk at>

> --------
> Martin Grigorov writes:
> > Any feedback and ideas how to tweak it (VCL or even patches) are very
> > welcome!
> First you need to tweak your benchmark setup.
>    aarch64
>           Thread Stats   Avg      Stdev     Max   +/- Stdev
>             Latency   655.40us  798.70us  28.43ms   90.52%
> Strictly speaking, you cannot rule out that the ARM machine
> sends responses before it receives the request, because your
> standard deviation is larger than your average.

Could you explain in what case(s) the server would send responses before
receiving a request ?
Do you think that there might be negative values for the latency of some
requests ?

> In other words:  Those numbers tell us nothing.
> If you want to do this comparison, and I would love for you to do so,
> you really need to take the time it takes, and get your "noise" down.
> Here is how you should do it:
>         for machine in ARM, INTEL
>                 Reboot machine
>                 For i in (at least) 1-5:
>                         Run test for 5 minutes
> If the results from the first run on each machine is very different
> from the other four runs, you can disrecard it, as a startup/bootup
> artifact.
> Report the numbers for all the runs for both machines.
> Make a plot of all those numbers, where you plot the reported
> average +/- stddev as a line, and the max value as a dot/cross/box.
> If you want to get fancy, you can do a Student's T test to tell
> you if there is any real difference.  There's a program called
> "ministat" which will do this for you.

ministat looks cool! Thanks!
I think I can save the raw latencies for all requests into a file and feed
ministat with it!

Gil Tene also didn't like how wrk measures the latency and forked it to wrk2 measures the latency by using
constant rate/throughput, while wrk focuses on as high throughput as
possible and just reports the latency percentiles.
wrk2 also prints detailed latency distribution as at (not as plot chart but still

The only problem is that wrk2 is not well maintained and it doesn't work on
modern aarch64 due to the old version of Lua. I'll try to upgrade it.


> Also:  I can highly recommend this book:
> --
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk at FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the varnish-dev mailing list