VIP23 (VSL refactoring) design sketch

Geoff Simmons geoff at uplex.de
Wed Apr 10 09:18:48 UTC 2019


On 4/10/19 09:13, Poul-Henning Kamp wrote:
> 

Responding out of order by taking the last two points first, because
they become relevant again further down.

> Far out:
>
> Given this level of abstraction we could reorder the arguments
> between VSL and presentation, so for instance timestamp could be
> serialized as:
>
> 	<8 byte double><8 byte double><8 byte double><ASCIIZ>
>
> So the VSLQ would always know where to find the doubles, but the
> presentation code in libvarnishapi would (still) render the string
> in front.

Yes, I thought about that too, generally it will be much smarter not to
have variable-length fields first. So the metadata will want to
distinguish between presentation order and field order within the record.

I forgot to mention yesterday that we'll also want a name for each field
in the metadata, for the JSON tag.

Starting to spitball what the code could like like -- the metadata for a
single field, after generation from tbl/whatever, might look something
like this:

struct vslf_meta {
	const char *	name;
	enum vsl_type	type;
	unsigned	order;
}

And then:

VSLF_Meta[SLT_Timestamp] = { 4,	// 4 fields
	{"t", DBL, 1},
	{"since_start", DBL, 2},
	["since_last", DBL, 3},
	{"label", ASCIIZ, 0},
}

Or we could generate two tables, one in "field-major" order like this,
and the other in "presentation-major" order.

Or, instead of the "unsigned order" field, it could be a ssize_t, which
has the byte position within the record if it can be precomputed, or -1
if there are variable-length fields in front of it.

Then we could have "presentation-major" order:

VSLF_Meta[SLT_Timestamp] = { 4,
	{"label", ASCIIZ, 24},
	{"t", DBL, 0},
	{"since_start", DBL, 8},
	["since_last", DBL, 16},
}

Notably, for headers we couldn't pre-compute the byte position of the
value field:

VSLF_Meta[SLT_ReqHeader] = { 2,
	{"name", ASCIIZ, 0},
	{"value", ASCIIZ, -1},
}

> We chould also benchmark emitting a length before ASCIIZ to make
> it faster to skip strings elements.

Thought about that one too, maybe in addition to ASCIIZ we'll want a
type like PSTRING (for "Pascal string", that's how I learned it in
school), which has a length field in front of a non-NULL terminated
string. As we do it for headers in the workspace already.

One byte for the length is enough for header names, but we'd need two
for VSL, since records can be up to just short of 4Kb. (Cookie header
values are infamously likely to fill out the entire log record.)

I'm not entirely sure about PSTRING. The length field would be great for
VSLQ as you say, but it would mean that varnishd is computing strlen()
all the time, and the point here is to hand off work like that to VSL
clients. And at one point we decided to have NULL-terminated strings
everywhere in VSL, so that we can always use the standard str*() functions.

Or are you thinking of having *both* the length field and NULL-termination?

> For all its problems, a single call to sprintf is actually pretty
> efficient in terms of CPU cache etc, because we almost always have
> it stuck there.
> 
> I'm not very keen to see a lot of macro-work here, for instance
> wrapping three memcpy's for Timestamp etc.
> 
> In theory one could extend the printf syntax with %V and similar,
> but that looses the compilers type-checking of arguments.
> 
> However, just because the "const char *fmt" and the varargs
> are the *same* as printf, doesn't mean we have to *do* the same
> thing as printf with them.
> 
> So as a total early morning strawman, before morning tea has
> fully kicked in, but *after* I read your email:
> 
>         VSLb(vsl, SLT_Timestamp, "%s: %.6f %.6f %.6f",
>             event, now, now - first, now - *pprev);
> 
> Becomes:
> 
> 	VSLb(vsl, SLT_Timestamp, "S%sT%fD%fD%f", 
>             event, now, now - first, now - *pprev);

I wasn't thinking about doing this with format strings because, if we
have the SLT metadata in tables already, then the tables are enough for
what we need to do. And the format string becomes another thing that can
be incorrect. The caller of VSL*() at least has to get the order of the
varargs fields right, which means checking that against the metadata.
Getting the format string right as well wouldn't be strictly necessary.

I agree that we'll be replacing the long-optimized library printf code,
however we go about it, so it will have to be fast. But we need to do
that in any case, whether we're interpreting format strings or running
through tables of metadata. We can't hand off the format string for
binary writes to vsnprintf(), or anything else in the library. How can
we do it without implementing code of our own that uses something like
memcpy() internally?

As noted above, we'll need to have metadata in tables at least for the
field names used for JSON. And I suspect that VSLQ will benefit from
metadata tables, when it can jump right to the byte where a field is
located. So I don't think that all of this works with format strings only.

If we run through metadata tables to write the binary records, I think
they're likely to be at least in L2 cache. If the set of SLT tags is
fixed, they can be const (but they probably can't be when we get to
extending the tags).

There is a point to be made that format strings are shorter, while a
table entry for the field formats could burst a cacheline. And that
format strings are probably a good idea for output functions used by VSL
clients -- for that, we really could hand of a format to *printf().

> This would be back-wards source compatible, making the the transition
> easier, and could allow us to make A/B testing by substituting "real"
> printf formats with a compiletime option.

That's also a good point.

> So where does <tbl/mumble.h> come from ?
> 
> It probably makes a lot of sense to wrap all the information one single
> place, possibly as .rst-oid format like ".vsc" files and have python
> chew on that and output all of docs, JSON metadata and the macro
> definitions for varnishd (which would not tbl/mumble.h then.)

Sure. I was thinking of just extending what we already have in
include/tbl/vsl_tags*.h on the grounds of "don't alarm everyone,
especially not phk, by removing something that's been fundamental so
far". But if you promise not to be alarmed, we could go with something
similar to *.vsc. Probably we could generate headers exactly like
vsl_tags*.h from that.


Thanks for the feedback,
Geoff
-- 
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20190410/46de52cd/attachment-0001.bin>


More information about the varnish-dev mailing list