VIP23 (VSL refactoring) design sketch

Geoff Simmons geoff at uplex.de
Fri Apr 12 14:00:11 UTC 2019


So I said we shouldn't get to optimizing too soon, and here I am doing
something like that. Because the discussion made me curious.

In the attachment you'll find:

- A first implementation of a function VSLwv() to write binary fields
into a log record buffer, using a printf-like format string and a va_list.

- Test calls for Timestamp, ReqAcct, ReqHeader and RespStatus (these are
#if-0'd out here).

- printf() calls to output the data from the binary log record (also
#if-0'd out). This confirms that we can get the data back out as it went
in, and experiments at how we could go about that.

- A benchmark with ca. one gazillion runs of VSLwv() and the
corresponding sprintf() calls, to compare the binary function with
library printf code.

- Results of running that through callgrind and perf record. To view the
results, you'll need a tool like kcachegrind, and perf report.

VSLwv() doesn't have bounds checking, to make sure that we don't write
past vsl_reclen, which of course a real version must have. That will
mean more conditional branching, but the branches will be predicted
almost always.

I grepped through current VSL*() calls, and these are more or less all
of the format characters we currently use: %s, %.*s, %.ns (for a length
n, as in %.20s), %d, %u, %x, %f, %j* and %z* (for (u)intmax_t and size_t
types). There may be one or two more or less, but that's close to all of
it. I added %hu for a response status as unsigned short (we currently
have %u for that). (There are some %p's for tags like ExpKill, mostly
only interesting for devs.)

Note that for the binary writes, we don't have to care if the data type
is unsigned, just copy the right number of bytes. We'll want the %u
formats for type checking, but VSLwv() can do the same thing for %d and
%u, and %zd and %zu, and so on. It's the output code that needs to get
signedness right.

This version always writes a length field as an unsigned short before a
string (maybe it should be uint16_t), and string fields are not
NUL-terminated.

So a final version would need the bounds checking, may handle one or two
more or fewer format chars, and may handle strings differently. Other
than that, I think this is about everything we'll need it to do.

What did the performance tools say?

Overall, VSLwv() did noticeably better than the library printf calls on
the various measures: fewer CPU cycles, better memory cache hit rates
(both instruction and data), and, generally, fewer branch mispredicts.
Not surprising, because library printf has quite a bit more to do.

As @Nils suspected, the weak spot is branch prediction. My gcc compiled
the switch statement as a jump table (jmpq *rax), and kcachegrind
reports 86% of indirect branch mispredicts there, far more than anything
else -- glibc vfprintf comes in next with 13%.

On the other hand, the library printf code has about twice as many
mispredicts for conditional branches, resulting in more mispredicts for
all types of branches.

So: this first shot at coding something up has a sore spot, but overall
appears to be measurably better than the printf stuff.

Some other lessons learned:

The binary records are not necessarily shorter than the ASCII-formatted
records. Particularly ReqAcct, which we have as 6 uintmax_t fields. That
comes to 48 bytes on my machine, about twice as long as a string like
"110 0 110 210 224 434". I don't think we can do much about that.
Request/responses can transfer up to gigabytes, which is why we need
uintmax_t. Unless we want to try something like variable-length integer
fields (which I doubt).

HTTP header records, with two variable-length strings, also won't be any
shorter than the ASCII format. If they have two 2-byte length fields,
they may come out as a couple of bytes longer (especially if they're
also NUL-terminated).

In the large, I think we'll be saving VSL space -- the binary Timestamp
records, for example, end up about half as long as the ASCII-formatted
version. But we won't be saving space in every record.

As you may notice, some of this code is ... not pretty. With idioms like
this:

	*((int *)p) = va_arg(ap, int);

	printf("%.*s: %f %f %f\n\n", *((unsigned short *)&buf[24]),
	       &buf[26], *((double *)&buf[0]), *((double *)&buf[8]),
	       *((double *)&buf[16]));

... I had pleasant thoughts like "phk is gonna kill me".

If we're going with something like this, we'll probably want some macros
and/or inline functions for better sanity, say:

#define CASTP(ptr, type) (*((type *)(ptr)))

And then:

	CASTP(p, int) = va_arg(ap, int);

For the output code, we could have the Python tool generate format
strings (perhaps more than one, such as "space-separated" and
"header-like"), field accessors, and varargs lists.

Say something like:

WS_FMT(SLT_Timestamp) ==> "%.*s: %.6f %.6f %.6f"
FLD_LEN(buf, SLT_Timestamp, 0) ==> CASTP(&buf[24], unsigned short)
FLD(buf, SLT_Timestamp, 0) ==> &buf[26]
FLD(buf, SLT_Timestamp, 1) ==> CASTP(&buf[0], double)
FLD(buf, SLT_Timestamp, 2) ==> CASTP(&buf[8], double)
FLD(buf, SLT_Timestamp, 3) ==> CASTP(&buf[16], double)

FMT_ARGS(buf, SLT_Timestamp) ==> \
FLD_LEN(buf, SLT_Timestamp, 0), FLD(buf, SLT_Timestamp, 0), \
	FLD(buf, SLT_Timestamp, 2), FLD(buf, SLT_Timestamp, 3)

So then it's:

	printf(WS_FMT(SLT_Timestamp), FMT_ARGS(buf, SLT_Timestamp));

... all generated so it's sure to be right. Something like the FLD()
accessors will of course also be needed for VSLQ, varnishncsa etc.


Best,
Geoff
-- 
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vsl_bin_bench.tar.gz
Type: application/gzip
Size: 390392 bytes
Desc: not available
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20190412/6602f3c3/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20190412/6602f3c3/attachment-0003.bin>


More information about the varnish-dev mailing list