VIP23 (VSL refactoring) design sketch

Dridi Boukelmoune dridi at varni.sh
Sat May 4 09:49:07 UTC 2019


On Fri, May 3, 2019 at 3:07 PM Dridi Boukelmoune <dridi at varni.sh> wrote:
>
> On Fri, May 3, 2019 at 12:07 PM Geoff Simmons <geoff at uplex.de> wrote:
> >
> > On 5/3/19 10:51, Dridi Boukelmoune wrote:
> > >
> > > First we lose the ability to treat the whole record as a single sting
> > > out of the box. The client will need to reconstruct the string from
> > > the various field types that may constitute a record. Think string and
> > > regex operators.
> >
> > Yes, Martin and I recently had a conversation about this on #hacking.
> > For the string equality comparisons (eq and ne), we could conceivably
> > look into the comparison string and see if it has blanks that cross
> > field boundaries. But as you pointed out, some VSL payloads have blanks
> > that are not field separators (VCL_Log, FetchError, etc).
> >
> > For regex comparisons, where the pattern could include things like "\s",
> > ".*", "\W" and so forth, detecting whether the pattern might match a
> > field-separating blank has the look of a sophisticated computer
> > science-y problem.
>
> Does it have to though? We only use those as raw strings, and never as
> regular expression patterns.

By the way, yes I'm aware that quoting fields containing blanks would
introduce the need of escaping at the very least double quotes and the
escape character.

> > Martin and I agreed that, at least in the first iteration, for VSLQ
> > queries with a string operator (equality or regex) we should render the
> > formatted payload into a temporary buffer and run the comparison against
> > that. Once we have it all working, maybe we could consider more clever
> > solutions, but I'd be surprised if we'd gain much from getting clever.
>
> Ack, that's the logical thing to start with.

One request I forgot to make in this area, when you rebuild the
complete record, it would be nice to enclose fields that contain
blanks with double quotes, so that we can have the same syntax on 6.0
since we're going to maintain it for a while and I would like to solve
the querying problem there too.


> > I think that an important part of this will be to emphasize in the docs
> > that authors of VSL queries should use field[n] indices and numeric
> > comparisons whenever it's appropriate. My (admittedly evidence-free)
> > impression is that not everyone uses them when they could.
>
> Well, today the fields need to be dup'ed anyway when we need a null
> terminated string so that will indeed need to be documented that field
> operations become much cheaper than record operations.
>
> > Say you're scanning backend logs for fetches to backend foo, like this:
> >
> > -q 'BackendOpen ~ "foo"'
> >
> > Since the backend name is the second field, it would be better as:
> >
> > -q 'BackendOpen[2] eq "foo"'
> >
> > Or to look for backend fetches that take longer than 1 second, it
> > wouldn't surprise me if some people are doing something like:
> >
> > -q 'Timestamp:Beresp ~ "[1-9]\.\d*$"'
> >
> > When it should be:
> >
> > -q 'Timestamp:Beresp[3] > 1.0'
>
> Agreed, I sometimes realize after running a long query on a VSL of the
> GB persuasion that I could have done better.
>
> > We'll do much better if we can narrow the query to a field, and if we
> > don't have to convert numbers at all. And we should make sure that users
> > know about it.
>
> Make the field index mandatory with [0] meaning the whole record and
> people will learn :p
>
> > > Finally, regarding the VSL file header that we'd hypothetically dump
> > > with varnishlog -w in the future (which I really hope we do), we could
> > > consider making this a VSL record itself that is not subject to
> > > vsl_reclen, querying or filtering, for the needs of dynamic VSLs, so
> > > that whenever something changes VSL clients see an update. It would
> > > then be varnishlog -w's job to build such a synthetic record in the
> > > VSL file before dumping actual logs.
> >
> > I agree, and I'd say that when we do have meta-data entries in the log,
> > it should be possible for them to appear at any time, in both log files
> > and in-memory logs. That would put us on the path to dynamic SLT tags.
> >
> > But we're not planning for that in the first iteration. phk has said
> > that he doesn't foresee dynamic VSL by the September release. And I
> > agree that we should get the rest working first.
>
> I disagree. If we draft a plan that 1) changes the in-memory AND
> on-disk format and 2) has later a possibility for dynamic VSLs we
> should make this part of the first iteration. We shouldn't change the
> format of such a critical piece of the infrastructure every other
> release.
>
> Also by making this both a synthetic record for live log consumer by
> libvarnishapi and in the future something that varnishd may produce
> then it becomes a no brainer for varnishlog. It just writes it on disk
> like any other log record. It may need some intelligence though when
> logs are rotated, like varnishlog asking for a synthetic record again.
>
> Dridi


More information about the varnish-dev mailing list