Architects Notes: Inline C code

When Brian Kernighan  commented on the PASCAL programming language, he raised many valid points, but by far the one that resonated most with me was

There is no escape.

It is a particular common kind of hubris for IT architects, to think that they know better than 100% of everybody else, this is less of a sin in Open Source than in Closed Source, but a sin nontheless.

In VCL programs, C program code can be put anywhere, simply by enclosing it in C{ ... }C

But where does inline-C code fit into the architecture ? does it fit in at all ?

Initially I thought we would see one or two users who needed to do something truly odd, and saw inline C as a way to make it easier for them to do their magic.

Instead of hacking up the main source and having to deal with version control and upgrade issues, they can put their magic code right there in their main configuration file, and thus stay in the "user-side" of the configuration/source border.

Obviously, the inline C code needs to stick its fingers in the request and as such, it depends heavily on the VRT (Vcl Run Time) environment for which the VCL compiler generates code.

The VRT environment is technically not necessary, the VCL compiler could just generate code directly for the internal function of the varnishd daemon, but early on I decided that "spending" a glue-layer of functions would make life easier all around.

For instance, if one of the HTTP-header munging functions gets changed, it will not affect the VCL compiler, only the VRT function(s) which call that procedure. That means that I don't have to shift my brain out of "varnishd" mode and into "compiler" mode to complete the change.

But it also helps to have a clear demarkation line, where the users unproven code stops and the proven and tested code of varnishd starts. The VRT functions are that barrier, and that allows the compiler to be simpler, more like a translator, than a full blown compiler.

But it also follows, that the VRT functions are not in any way shape or form optimized for human programming, most of them rely on the compiler having tested arguments, and in some cases, converted arguments to more convenient formats for internal processing.

Future of Inline C

One of the things that should not have surprised me, but did, was that people didn't write their code as inline-C, instead they wrote a shared library and added just a couple of lines of inline-C to load it and call a function in it.

Some other common themes are emerging, such as caching things in the workthread, having an initialization function that should be called and so on.

The basic inline-C facility will not change, it is a powerful escape for people who know better than me or my VCL compiler, but it may make sense to augment it with support for shared libraries.

Thinking out loud, I could imagine a syntax along these lines:

    backend b0 {
        ...
    }

    shlib spamcheck "/usr/local/lib/spamcheck.so";

    sub vcl_recv {
        if (spamcheck::examine_url(req.url)) {
            error 403;
        }
        if (req.body && spamcheck::examine_body(req.body)) {
            error 403;
        }
        set req.http.spamcheck = spamcheck::approve(client.ip);
    }

Obviously, there is an interesting question of calling convention.

The functions in the shared library should have access to the session pointer, if nothing else as a means of escape, but also because most VRT functions needs it as an argument.

But it would be beneficial to allow calls like the above example to deal with the VRT functions and call the shared library functions with more usual C-style arguments.

It may be as simple as defining a calling convention that says that all functions take a "const char *" and return a "char *" return value, although limited in expression, it is amazing what you can do with that.

A more versatile prototype would be char *somefunc(struct sess *sp, int argc, char **argv)

It could also be solved by having the shared library export, via some symbol, the actual prototypes for the VCL compiler to use, what format and information is a tricky question to answer, for intance a string return would also need to inform how to dispose of the string: should it be freed by calling free(3) or will it garbage collect itself ?

The shared library should also get a chance to be invoked at VCL load/unload times, to do init/cleanup stuff and it may make sense to give it a chance to participate in session and workthread creation and cleanup as well.

If we grow such a modular extension mechanism, larger and more complex features can be implemented as stand alone modules for Varnish, but it would be a mistake to make tightly integrated facilities like ESI, encryption or compression a module this way, they get too involved in the nitty-gritty low level stuff.

As always, comments and input is most welcome.

Poul-Henning