VIP23 (VSL refactoring) design sketch

Fri Apr 12 15:35:37 UTC 2019

On 4/12/19 16:37, Dridi Boukelmoune wrote:
> 
> Interesting, for textual logs the effect of vsl_reclen is
> straightforward but how should we deal with binary records truncation?

On 4/12/19 16:52, Nils Goroll wrote:
> As before - truncate?
>
> For the fixed part, we can make sure that vsl_reclen can not be set to below a
> sensible minimum, and I do not see a difference between truncating text and
> binary. In both cases the log is just - a log, and may lose data anyway.

Because of the idea of re-ordering fields in the binary record, mainly
so that the longer, variable fields go last, probably not the same thing
will happen when we truncate records in the new version as in the old.

If someone has the crazy idea of setting vsl_reclen to the current
minimum of 16b, then currently we get something like this:

Timestamp      Request: 1554970

If we put the double fields first and the string field last in the
binary record, then with vsl_reclen=16b we'll get:

Timestamp      : 1554970958.415837 0.000023

Both broken, of course, but not the same way, and all because vsl_reclen
is just way too short. Anyway, I think this will be pretty hard to
"break, and break the same way as before".

But I'd go for this rule:

* If a fixed-length field would overflow the buffer, then don't write
it, and stop.

* If a variable-length field overflows the buffer, write as much of it
as we can, then stop.

The outputs will turn out screwy, one way or another.

I agree with @Nils that we should consider increasing minimum
vsl_reclen, because at 16b the logs will just be broken.

>> Note that for the binary writes, we don't have to care if the data type
>> is unsigned, just copy the right number of bytes. We'll want the %u
>> formats for type checking, but VSLwv() can do the same thing for %d and
>> %u, and %zd and %zu, and so on. It's the output code that needs to get
>> signedness right.
> 
> Should we though? If someone runs Varnish in a virtual machine on a
> mainframe server and runs varnishlog -w don't we want to be able to
> varnishlog -r that file on a "regular" workstation?

Types may have different sizes, byte orders might be different, doubles
might not even be IEEE754 binary64 format -- there is one scenario in
which that all becomes an issue. When we write a binary log on one
machine, and read it on another architecture. Other than that, we always
write and read logs on the same architecture.

AFAIK, "cross-architecture" log writes and reads via binary file are not
guaranteed to work now either.

It would be kinda awesome if we can abstract VSL enough to get that
scenario to work -- probably by saving the integer type sizes in the
metadata, using a BOM byte, and so forth (but what in the world do we do
about different double formats?). But IMO we should get everything else
to work first.

Is there any architecture for which unsigned and signed integer types
have different byte widths?

> Until the vsl_reclen question is sorted out, you can always copy the
> string and grab its length while doing so, and once it's written and
> you have the length you can write it at the field length's address.

We should use library functions, memcpy or strcpy, for any multi-byte
copying, not do it ourselves. It's really surprising what they do to
optimize what seems to be so simple -- vector instructions, copying
wider types (cast it all to intmax_t and copy 8 bytes at a time)... The
performance differences are significant, we'll never keep up with what
the C compilers are able to figure out.

> Having null-terminated strings open a zero-copy opportunity on the VSL
> consumer side.

Yeah, we made a deliberate choice to always NUL-terminate the log
records, so that clients can always use str*() stuff. There are a few
options to consider now:

* strings are NUL-terminated
* strings have length fields
* strings are NUL-terminated *and* have length fields
* some strings do it one way, other strings do it the other way,
  depending on which SLT tag and field it is

>> The binary records are not necessarily shorter than the ASCII-formatted
>> records. Particularly ReqAcct, which we have as 6 uintmax_t fields. That
>> comes to 48 bytes on my machine, about twice as long as a string like
>> "110 0 110 210 224 434". I don't think we can do much about that.
>> Request/responses can transfer up to gigabytes, which is why we need
>> uintmax_t. Unless we want to try something like variable-length integer
>> fields (which I doubt).
> 
> We could, but then that would force us to pre-compute variable-length
> field sizes. (are there others than strings?)

Well, we could make "110 0 110 210 224 434" much shorter in binary form,
if we knew that in this case, it's 5 uint8_t's and a uint16_t. That fits
into 7 bytes, rather than 48; except that we'd also have to encode the
integer field widths somehow, if it's going to be sometimes uint8_t,
sometimes uint16_t, sometimes uintmax_t, etc. Over burgers today, @Nils
got to thinking about Huffman encoding. But we agreed that it's better
not to bother with something like that, at least not for now. %^)

Best,
Geoff
-- 
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20190412/ee00e4bf/attachment-0001.bin>