VIP23 (VSL refactoring) design sketch
Dridi Boukelmoune
dridi at varni.sh
Fri May 3 13:07:17 UTC 2019
On Fri, May 3, 2019 at 12:07 PM Geoff Simmons <geoff at uplex.de> wrote:
>
> On 5/3/19 10:51, Dridi Boukelmoune wrote:
> >
> > First we lose the ability to treat the whole record as a single sting
> > out of the box. The client will need to reconstruct the string from
> > the various field types that may constitute a record. Think string and
> > regex operators.
>
> Yes, Martin and I recently had a conversation about this on #hacking.
> For the string equality comparisons (eq and ne), we could conceivably
> look into the comparison string and see if it has blanks that cross
> field boundaries. But as you pointed out, some VSL payloads have blanks
> that are not field separators (VCL_Log, FetchError, etc).
>
> For regex comparisons, where the pattern could include things like "\s",
> ".*", "\W" and so forth, detecting whether the pattern might match a
> field-separating blank has the look of a sophisticated computer
> science-y problem.
Does it have to though? We only use those as raw strings, and never as
regular expression patterns.
> Martin and I agreed that, at least in the first iteration, for VSLQ
> queries with a string operator (equality or regex) we should render the
> formatted payload into a temporary buffer and run the comparison against
> that. Once we have it all working, maybe we could consider more clever
> solutions, but I'd be surprised if we'd gain much from getting clever.
Ack, that's the logical thing to start with.
> I think that an important part of this will be to emphasize in the docs
> that authors of VSL queries should use field[n] indices and numeric
> comparisons whenever it's appropriate. My (admittedly evidence-free)
> impression is that not everyone uses them when they could.
Well, today the fields need to be dup'ed anyway when we need a null
terminated string so that will indeed need to be documented that field
operations become much cheaper than record operations.
> Say you're scanning backend logs for fetches to backend foo, like this:
>
> -q 'BackendOpen ~ "foo"'
>
> Since the backend name is the second field, it would be better as:
>
> -q 'BackendOpen[2] eq "foo"'
>
> Or to look for backend fetches that take longer than 1 second, it
> wouldn't surprise me if some people are doing something like:
>
> -q 'Timestamp:Beresp ~ "[1-9]\.\d*$"'
>
> When it should be:
>
> -q 'Timestamp:Beresp[3] > 1.0'
Agreed, I sometimes realize after running a long query on a VSL of the
GB persuasion that I could have done better.
> We'll do much better if we can narrow the query to a field, and if we
> don't have to convert numbers at all. And we should make sure that users
> know about it.
Make the field index mandatory with [0] meaning the whole record and
people will learn :p
> > Finally, regarding the VSL file header that we'd hypothetically dump
> > with varnishlog -w in the future (which I really hope we do), we could
> > consider making this a VSL record itself that is not subject to
> > vsl_reclen, querying or filtering, for the needs of dynamic VSLs, so
> > that whenever something changes VSL clients see an update. It would
> > then be varnishlog -w's job to build such a synthetic record in the
> > VSL file before dumping actual logs.
>
> I agree, and I'd say that when we do have meta-data entries in the log,
> it should be possible for them to appear at any time, in both log files
> and in-memory logs. That would put us on the path to dynamic SLT tags.
>
> But we're not planning for that in the first iteration. phk has said
> that he doesn't foresee dynamic VSL by the September release. And I
> agree that we should get the rest working first.
I disagree. If we draft a plan that 1) changes the in-memory AND
on-disk format and 2) has later a possibility for dynamic VSLs we
should make this part of the first iteration. We shouldn't change the
format of such a critical piece of the infrastructure every other
release.
Also by making this both a synthetic record for live log consumer by
libvarnishapi and in the future something that varnishd may produce
then it becomes a no brainer for varnishlog. It just writes it on disk
like any other log record. It may need some intelligence though when
logs are rotated, like varnishlog asking for a synthetic record again.
Dridi
More information about the varnish-dev
mailing list