VIP23 (VSL refactoring) design sketch
Geoff Simmons
geoff at uplex.de
Tue Apr 9 17:38:52 UTC 2019
Hello all,
UPLEX may get support to develop a PR as a first step to implementing VIP23:
https://github.com/varnishcache/varnish-cache/wiki/VIP-23:-Refactor-VSL-to-support-extracting-structured-data-from-%22binary%22-log-payloads
Whether we get the support is yet to be finalized, but if so we'll give
it high priority. phk asked me to post a design sketch in advance before
we put in the major time and effort, so that others can comment, and
warn us off we're going in the wrong direction.
Please look at VIP23 in the wiki along with this mail, I won't repeat
everything here.
The goals are:
- The log does not contain ASCII-formatted data for all fields, but some
"binary" data, especially for numeric fields. Data that are naturally
strings, such as header values, of course remain as strings.
- That means that varnishd writes binary fields into the log.
- VSL clients read the log and format the output.
- Outputs for structured, typed data, in particular JSON from varnishlog
-j, fall out easily, since the data is already structured and typed
(don't need to parse strings in the log).
I think it's important that the PR demonstrates how we can make this
change incrementally, without a "monster commit" that changes everything
at once. So the PR will be the first iteration in the transition:
- The PR implements the structures and functions underlying the new VSL
concept.
- Some, but not all of the SLT tags are handled with the new design.
- SLT tags that are left unchanged in the first iteration have as
metadata "one ASCII field", i.e. one field of type ASCIIZ (for
null-terminated strings), and everything works with them as previously
for VSL.
The remaining steps to complete the refactoring will then be to
re-implement the SLT tags one at a time, until we've covered them all.
After that, we can go about removing code that has become obsolete.
VSL queries with the field selector (tag[n]) will end up working quite
differently. Now we scan the string and separate it at whitespace, in
the end we'll just select field n. During the transition, we'll have to
have both techniques working, depending on whether the SLT tag has been
transitioned.
Design sketch:
- include/tbl/vsl_tags*.h continue as the "source of truth" about SLT
tags. The "code-generating macros" in the tables are extended to express
metadata about the payload fields. I imagine that we add at least:
- n (number of fields)
- a sequence of n symbols such as ASCIIZ, DBL, UINT16, UINT32 etc.
for data types
- The data type symbols form an enum, and the enum indexes into a table
of their sizes in bytes, if they have a fixed size (ASCIIZ, at least,
does not).
- In the first iteration, all but a few selected tags have one field of
type ASCIIZ.
- Current VSL*() functions work with such tags without changes.
- varnishlog -j formats these cases as in Guillaume's first
implementation -- two string fields "tag" and "value".
- VSL queries with [n] separate these at whitespace.
- Suggested SLT tags with the new paradigm in the first iteration:
Timestamp, (Be)RespStatus, (Be)ReqAcct, *Header. Maybe some tags that
currently have binary contents, and something for H/2. We should try to
cover a broad variety of data types.
- The PR includes functions/macros for VSL writes & reads in the new
paradigm, using the metadata, similar in spirit to printf/scanf. Such as:
- VSLw(), VSLwv() etc. for writing to the log.
- VSLr(), VSLrv() etc. for reading from the log.
- Function(s) for "format the fields in a ws-separated string".
- Function(s) for "format header-like" as "<name>: <data>" for
headers and timestamps.
- These functions don't need a printf-like format string, we just make
use of the metadata. The idea is:
- start with a pointer at the start of the payload
- read/write sz bytes at the pointer, where sz is the size of the
next field
- advance the pointer to ptr + sz, continue
- VSL queries with [n] for tags in the new paradigm select field n.
- varnishlog -j emits structured data for tags in the new paradigm.
- std.log() and std.timestamp() are updated for the new paradigm, and
all of the vtc tests concerning them pass without changes.
- All of the vtc tests for varnishlog, varnishncsa, varnishhist and
varnishtop pass without changes.
That would be covered in the PR. The plan after merging the PR would be:
- Incrementally change the remaining SLT tags to use the new paradigm.
- After all of the tags have been updated, remove obsolete code.
After that, we can look at the "other considerations" in VIP23:
- SLT flag UNTRUSTED
- A BOM-like byte in binary logs
- Saving the metadata as a header in binary logs, so that log readers
get the metadata from the file.
- Dynamically extensible SLT tags with their own metadata.
Comments welcome!
Best,
Geoff
--
** * * UPLEX - Nils Goroll Systemoptimierung
Scheffelstraße 32
22301 Hamburg
Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753
http://uplex.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20190409/866314bc/attachment.bin>
More information about the varnish-dev
mailing list