VCL language

Tue Mar 28 08:58:14 CEST 2006

In message <2607.193.213.34.102.1143499332.squirrel at denise.vg.no>, "Anders Berg
" writes:

>seeing that Poul-Henning is back to the VCL compiler again in the code, I
>think it is time to start a more detailed discussion about the VCL.

Good time for it.

The work I did yesterday was basically just getting the compiler hooked
into varnishd and get the cache process to load the result.  So far I
don't actually use the loaded code for anything yet.

>My guess is that we are gonna spend "alot" of time on it, and it could be
>a natural part of a face-to-face meeting. I also acknowledge that the
>sooner we "freeze" the language, the easier and less rewrite of code
>Poul-Henning has to do.

It's important to keep two things clear of each other here.  On one hand
there is "the language", this about where semicolons go and how many
f's we put in 'if' and that sort of thing.  On the other hand, there
are the variables and operations we give the language to work with.

The first part, the language, I think is pretty solid by now, and
with some amount of polishing and maybe a new feature or two, that
is pretty 'done' at this time.

The other part, the variables and operations is in need for being
hashed out, because that is the next bit that I need to start working
on:  Calling into the compiled VCL program from the cache process.

But there is a lot of code yet to be written before this stuff gets
in the critical path, so there is no need to yell "emergency" or
anything like it :-)

>I think our small "proof-of-concept" and the general look-and-feel of VCL,
>will make it suitable and _really_ good for Varnish.

I think VCL is the bit which will make people sit up and take notice :-)

>I/We haven't gotten down to trying/thinking/poking/defining/documenting
>the VCL yet, but I _think_ I might have come up with a "system" to make
>VCL easier to understand, and possible easier to code both for
>Poul-Henning and the end user. I am attaching 2 documents:
>vcl_diagram_v1.png and vcl_diagram_proposal.png (*v1 is approx. what we
>have today)

I can't say I have thought deeply about the data model yet, my initial
mock-up was based on a semi-object oriented model where we basically
had four data objects:

	Client	(Who asked)
		IP#
		Bandwidth estimate
		failed requests
		user agent
		...

	Request	(What they asked for)
		URL
		HEAD/GET/other
		Headers
		...

	Object	(Document in our cache)
		ttl
		length
		usage count
		refresh count
		...

	Backend	(Where we can get documents from)
		IP#
		responsetime
		...

>Also, the object (document if you like) has 2 sets of variables. For
>example I think that backend.obj.usage and client.obj.usage makes sense.
>Lets say it's a number/factor to say how often this object is
>used/refreshed. A JPEG will have a low backend.obj.usage (since it typical
>is not often requested from backend) but client.obj.usage will be high
>(because its requested often, logo etc...). I can also think of more uses
>here.

I don't disagree with the two different usable numbers, but I think
I do disagree with the naming.  What you call backend.obj.usage isn't
really a usage count, it is a refresh count, and since the client
can't do that, just object.refreshcount would work without confusion.

A fundamental rule in object-oriented programming is to make sure
you have a good correspondence between your objects and the real world
objects they represent, and I think splitting the "object" (or should
we call it "document" instead ?) into a client and a backend side
misses the point about the cache:  It is the cached documents which
are interesting here.

The other thing I would like to point out is that a given document
does not have a static mapping to a backend.

For instance, we may pick it up from a peer server during startup
and then subsequently refresh it from one of a number of backends
whenever it is in danger of expiring.

So the object/document clearly cannot be tied to a particular
backend without severely constraining the efficiency and flexibility.

>I have left out to define more variables in vcl_diagram_proposal.png, but
>here are some examples of other "new" areas of variables:
>
>if (client.req.url.host ~ cnn.no){
>    backend.req.url.host = vg.no
>}
>
>This is rewrite :)

Yes, but with the footnote that the host part has a more stringent
syntax than the other half of the URL.

But anyway, at the risk of sounding like a broken record, I think the
best way to find out how VCL should develop is to try and use it,
so lets sit down and write an actual real-life VCL program for VG's
site, and see what we find out along the way.

Poul-Henning

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.