Fleshed out ideas from VDD25Q2

Poul-Henning Kamp phk at phk.freebsd.dk
Wed May 28 06:53:14 UTC 2025


This is my personal attempt to flesh out some of the things we discussed
at VDD25Q2 in a bit more detail.

A) More modular VCL
-------------------

Points of pain:

	"Having everyting in one VCL file"
	"Slow VCL compiles"
	"Useless backend state/statistics reporting."

A) Diagnosis:

In complex setups, you either end with a tangled VCL file that does
many different things in a lot of conditional clauses, or you end
up with a less tangled VCL file that tries to determine which of
multiple VCL files should handle this particular request.

If you do the latter, you have to repeat the backend declarations
in many of the VCL files which causes fragmented backend statistics.


A) Concrete proposals:

A.1) Make it possible to import and export backends(=directors) between VCLs.

To do this, we must discard one original dogma:

	"There is *exactly* one active VCL at any moment in time.

We did that for good reasons, I will argue that it allowed us to
deliver the very valuable and successful feature of "truly instant
reconfiguration", but on review, it is now a limiting factor.

Strictly speaking we already broke that dogma with return(vcl)
but we hid that so well, that we did not have to change a single
word in the documentation.

Letting it go (more) has consequences, most obviously, we will need
some way to decide which of multiple active VCL's we throw the
incoming requests at, but as long as "the other active VCLs" do
not contain a vcl_recv{}, that is obvious.

Sharing backends/directors and ACLs across VCLs means we need some
way to make sure all other threads from other VCLs are out of this
one, before we can cool and unload it.  That is CS-101 stuff multi-
thread material, but performance cannot be ignored.

But the immediately obvious follow-up question is:  Why can't I
also export&import SUBs ?

I wont go into the details (compatibility with the vcl_method they
are called from), but that runs into an equally old dogma:

	"If you can vcl.load a VCL, you can vcl.use that VCL."

This one already has a footnote attached to it, relating to
VMODs being able to veto going from cold to hot, but otherwise
it still holds.

This originated in a desire to have a preloaded, ever-ready "emergency
VCL" so that when the newspaper backend monster keeled over, there
would be a single /reliable/ switch to throw.

Is that "killer-feature" or "really, I didn't know that..." ?

Right now I truly dont know the answer, so for that reason alone
sharing SUBs is "desirably but for further study" at this time, 

So for now: I think we should implement export/import of backends
and ACLs, since I think they "come for free", but not commit
to sharing SUBs.

(See in A.2 for CLI implications.)

A.1) Thoughts about implementation

Exporting things must be explicit, we do not want VCLs to be
able to grab random stuff from other VCLs, both as a matter
of sanity, and to keep the list of exported object small.

For the same reasons as for return(vcl), the imports have to go
through labels, otherwise "the other" VCL cannot be replaced.

Exporting the backends from a single VCL, instead of replicating
their definition in new versions of the active VCL or in multi-app/tennant
VCLs, means that the statistics and state will not be fragmented.
We may want more, (see below,) but it will be a step in the right
direction.

A.1) Summary:

Low to medium complexity, good and concrete benefits which
would be a selling point for 8.0.

A.2) Add a central switchboard.

I think the final version of the idea we came up with, was
something like this mock-up:

	vcl 4.2;

	vcl_scope {
		req.http.host suffix "example.com";
		req.url prefix "/hello_world";
		return(mine(100));
	}

These "selectors" will be merged into a single decision data structure
which a central dispatcher uses to decide where the requests goes.

I think we also had consensus for adding an escape mechanism along
the lines of:

	vcl 4.2;

	sub vcl_match {
		if (client.ip ~ inhouse_acl && req.url ~ "editor") {
			return (mine(100));
		}
		return (notme);
	}

Such functions cannot be merged, but must be executed serially,
which rules them out as the only method, but there seems to be solid
use-cases for having a few, for instance purges, inhouse vs. outside,
log4j detection etc.

So far, so good.

We need CLI commands to do this, including a "vcl.unuse" which
we never had before and a "vcl.substitute" to atomically do a
vcl.unuse + vcl.use.

If we're adding two new CLI commands, we gain nothing from overloading
"vcl.use" as the third, so we should add three all new CLI,
something like:

	vcl.insert -  add a vcl to the switchboard
	vcl.remove -  remove a vcl from the switchboard
	vcl.replace - atomic add+remove

That eliminates the need for a setting to enable this new "switchboard
mode":  We power up the swichboard on first vcl.insert and power
it down onlast vcl.remove.

That again means that even people who do not use the switchboard
would be able to "vcl.insert log4j_mitigator.vcl" without editing
their VCL. (killer-feature ?)

But that only works if the switchboard defaults to their usual VCL,
when none of the vcl.insert'ed VCL's match.

So I think the final result looks like:

	There is *exactly* one active VCL at any moment in time,
        requests go here, unless the switchboard dispatches them,
	(But "active" now means something slightly different.)

	There can be any number of "library VCLs" loaded with
	"vcl.library" containing only backend/directors
	and ACLs (for now).

	There can be any number of "subscriber VCLs" loaded with
	"vcl.insert" which go though the switchboard.

A.2) Thoughts about implementation

How are conflicting selectors resolved ?  In the above examples
I put "mine(100)" as a way to assign priorities.  Better ideas ?

I'm slightly concerned about the rebuilding/reconfiguration
of the merged decision data structure when there are many VCL's.

Nobody argued for using regular expressions, which I suspect was
partly a healthy respect for implementing the merge, and partly
because those fields are not just strings (%xx, case-insensitity,
I18N DNS etc.)

It seem obvious to allow multiple selectors on each of the two
fields, and to give them "or" semantic, so that a single vcl_scope{}
can match multiple domains and/or multiple urls.

But assuming the two fields (host+url) inside the selector have
"and" semantics, I think we should also allow multiple vcl_scope{}
per VCL, so that a single VCL can handle:

	vcl_scope {
		req.http.host suffix "example.com";
		req.url prefix "/hello_world";
		return(mine(100));
	}

	vcl_scope {
		req.http.host suffix "examplè.fr";
		req.url prefix "/bonjour_monde";
		return(mine(100));
	}

	vcl_scope {
		req.http.host suffix "example.de";
		req.url prefix "/guten_heute_leute";
		return(mine(100));
	}

and if that still cannot do what people want, there is the
vcl_match{} escape-mechanism.

I wonder if host+url is too restricive?

I can imagine, but dont know the relevance of, also selecting on
user-agent and particular cookies being present or absent, but with
the escape-mechanism, we can collect real-world experience before
we decide that.

A.2) Summary:

This one goes all over the place, VCC, CLI, locking, and using
somebody's exam-results in CS date structures in real life.

I cant imagine this is realistic for 8.0, and I dont see any ways
to be "a little bit pregnant" with it.  But if my outline holds up
to scrutiny, it is additive and will not have to wait for 9.0.

B) "Plain backends are too plain"
---------------------------------

Points of pain:

	"DNS answers with multiple IPs"
	"DNS response frozen at vcl.load time"
	"Probing backends with rapidly changing IPs"
	"Fragmented (connection pool) statistics"

B) Diagnosis:

In 2006 backends were real backends and Kubernetes was not a real word.

Until we have a "discover" service which checks if the DNS response
has changed - we are stuck with freezing the DNS response at
vcl.load time.

But we could stop being anal about DNS responses with multiple IP
numbers, which would at least allow people to work around that
limitation by reloading their VCL every N minutes.

B) Concrete proposals:

Have VCC accept DNS responses with multiple IPs, use them round-robin.

B) Thoughts about implementation

I'm not sure "use them round-robin" cuts it.  For instance if we
get both IP4 and IP6 but have no IP6 connectivity.

A better default policy may be "once you find one that works, stick
with it, until it stops working.

Do we probe all the IP's ?

Should we compile it into a round-robin director, to avoid code duplication ?

B) Summary:

Once the questions are answered, this should be pretty straight
forward, and not be difficult to complete before 8.0. (Famous Last
Words™)

C) VSL roll-overs
-----------------

Points of pain:

	"Extra memory copies in clients to 'evacuate' requests in
	danger of being overwritten"
	"Complexity in clients to monitor danger of overwrites."

C) Diagnosis

In 2006 wire-speed was 100 Mbit/sec, and if your VSL clients were
not fast enough, that was not our problem.

C) Concrete proposal

Instead of one big SHM segment, varnishd creates N distinct files
which occupy the same amont of space, and announces them in the VSM.

Varnishd picks an available segment and updates it's open and "do
not use past" timestamps in VSM.  When that segment is full, repeat
the process.

VSL-Clients monitor the index and process the files in timestamp sequence.

When the client opens a segment, it links a unique filename to that
file, so the inode link-count increases, and it removes that filename
again when the client no longer needs any data in that segment.

Clients should arm "atexit(3)" handlers to nuke the unique filenames
when they end.

Varnishd considers a segment available if it's previous "do not
use past" timestamp is expired, and the inode link count is one.

C) Thoughts about implementation

This proposal eliminates VSL overwrites entirely, but adds some
new failure modes:

VSL-Client dies without removing the timestamp which holds the
inode link, leaving segment(s) locked until those stray files
are removed.  If the VSL-Clients unique names are preditcable
from their PID, varnishd could patrol such files with kill(0).

When clients are too slow or get stuck, varnishd may run out of
available segments, and varnishd will serve traffic without
logging it.

Counters should record how many transactions and VSL records were
not written, and the VSM needs to communicate to clients that there
is a hole in the VSL stream, otherwise the clients may never release
the prior segments they hold on to.

A parameter can change the default, so varnishd instead stops serving
traffic if it cannot be logged.  Here the "do not use past" timestamp
can be used as configurable minimum duration of VSL "look-back".

The inode link-count trick is neat, but involving the filesystem
and that may be too expensive.  Once VIPC is in, we can use that
and eliminate the "stray files" problem.

In light of the "make all cli JSON" discussion, maybe this should
be the first customer of VIPC ?

C) Summary

Very limited amount of code involved, this might make it into 8.0.

Feedback kindly requested...

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


More information about the varnish-dev mailing list