VIP23 (VSL refactoring) design sketch

Poul-Henning Kamp phk at phk.freebsd.dk
Mon Apr 15 06:06:56 UTC 2019


--------
In message <d1ee0467-fe1d-b135-3825-b3f4c7a1e999 at uplex.de>, Geoff Simmons write
s:

Now that I found your mock-up in my spam-folder (no idea why) and
have a semi-working laptop again:

>In the attachment you'll find:
>
>- A first implementation of a function VSLwv() to write binary fields
>into a log record buffer, using a printf-like format string and a va_list

For strict alignment platformns I think you have to explicitly use
memcpy(3) in all cases, and leave it to the compiler to decide if
it really needs to call it.

>I grepped through current VSL*() calls, and these are more or less all
>of the format characters we currently use: %s, %.*s, %.ns (for a length
>n, as in %.20s), %d, %u, %x, %f, %j* and %z* (for (u)intmax_t and size_t
>types).

Yes, the vocabulary isn't that big to begin with.

>As @Nils suspected, the weak spot is branch prediction. My gcc compiled
>the switch statement as a jump table (jmpq *rax), and kcachegrind
>reports 86% of indirect branch mispredicts there, far more than anything
>else -- glibc vfprintf comes in next with 13%.

A switch may not be the optimal performance choice. If we collect
stats on frequency of the various formats, a handsorted if-else-if
chain may be better.

>HTTP header records, with two variable-length strings, also won't be any
>shorter than the ASCII format. If they have two 2-byte length fields,
>they may come out as a couple of bytes longer (especially if they're
>also NUL-terminated).

We could ENUM the most common headers.  I've been considering that in
general, but the performance trade-off isn't obvious.

>	printf("%.*s: %f %f %f\n\n", *((unsigned short *)&buf[24]),
>	       &buf[26], *((double *)&buf[0]), *((double *)&buf[8]),
>	       *((double *)&buf[16]));
>
>-.. I had pleasant thoughts like "phk is gonna kill me".

For a mock-ups to test a concept, I've done far worse myself :-)

>Something like the FLD()
>accessors will of course also be needed for VSLQ, varnishncsa etc.

All things considered, I suspect the generating VSL calls to be
the simpler part of the task and VSLQ to be the harder one, so
we should probably look carefully at that as well.

Poul-Henning

PS: Re: slinks idea with generating functions to emit the VSL
records, I will mention one evil trick I have seen, but say up front
that we are not going there:  If you machine-generate the producer
functions you can basically memcpy memcpy'ed the stack-frame into
a network packet (or VSL record).  This is horribly CPU-calling-convention
specific, and needs special casing for strings, but in olden days
before really smart compilers (ie: 68k) it was a pretty smart move.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


More information about the varnish-dev mailing list