VIP23 (VSL refactoring) design sketch

Poul-Henning Kamp phk at phk.freebsd.dk
Wed Apr 10 07:13:58 UTC 2019


--------
In message <82919b48-25e0-d22f-1029-6d17db040519 at uplex.de>, Geoff Simmons writes:

>Whether we get the support is yet to be finalized, but if so we'll give
>it high priority. phk asked me to post a design sketch in advance before
>we put in the major time and effort, so that others can comment, and
>warn us off we're going in the wrong direction.

Yes, I really hate it when people go off and do something in quiet
and come back with something which isn't quite right for some reason
they overlooked.

>- That means that varnishd writes binary fields into the log.

For all its problems, a single call to sprintf is actually pretty
efficient in terms of CPU cache etc, because we almost always have
it stuck there.

I'm not very keen to see a lot of macro-work here, for instance
wrapping three memcpy's for Timestamp etc.

In theory one could extend the printf syntax with %V and similar,
but that looses the compilers type-checking of arguments.

However, just because the "const char *fmt" and the varargs
are the *same* as printf, doesn't mean we have to *do* the same
thing as printf with them.

So as a total early morning strawman, before morning tea has
fully kicked in, but *after* I read your email:

        VSLb(vsl, SLT_Timestamp, "%s: %.6f %.6f %.6f",
            event, now, now - first, now - *pprev);

Becomes:

	VSLb(vsl, SLT_Timestamp, "S%sT%fD%fD%f", 
            event, now, now - first, now - *pprev);

And serialize into VSL as:

	<HEADER><ASCIIZ><8 byte double><8 byte double><8 byte double>

Meanwhile, the "S%sT%fD%fD%f" string (possibly transformed) tells
the client side that we have a string, a timestamp and two durations.

We could make the "veprintf" format string come out of the tbl/mumble.h
paired with the SLT_ define:

#include <tbl/mumble.h> magic produces:

	#define SLT_Bin_Timestamp SLT_Timestamp | SLT_BINARY, "S%sT%fD%fD%f"

Then in source call becomes:

	VSLb(vsl, SLT_Bin_Timestamp, event, now, now - first, now - *pprev);

That way nobody confuses it with a "regular" printf, and the
SLT_Timestamp bit alerts the VSL code to interpret the fmt string
as "veprintf" syntax.

This would be back-wards source compatible, making the the transition
easier, and could allow us to make A/B testing by substituting "real"
printf formats with a compiletime option.

The "veprintf" syntax needs to be carefully thought out, for instance
"S%f" should be a syntax error, and that needs to be caught somewhere,
preferably at compile time, but worst case we could walk all the
strings from <tbl/mumble.h> with a varnishd -x argument as part of
the build.

>- Outputs for structured, typed data, in particular JSON from varnishlog
>-j, fall out easily, since the data is already structured and typed
>(don't need to parse strings in the log).

So where does <tbl/mumble.h> come from ?

It probably makes a lot of sense to wrap all the information one single
place, possibly as .rst-oid format like ".vsc" files and have python
chew on that and output all of docs, JSON metadata and the macro
definitions for varnishd (which would not tbl/mumble.h then.)

Far out:

Given this level of abstraction we could reorder the arguments
between VSL and presentation, so for instance timestamp could be
serialized as:

	<8 byte double><8 byte double><8 byte double><ASCIIZ>

So the VSLQ would always know where to find the doubles, but the
presentation code in libvarnishapi would (still) render the string
in front.

We chould also benchmark emitting a length before ASCIIZ to make
it faster to skip strings elements.

Either 'trick' might make VSLQ faster.


-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


More information about the varnish-dev mailing list