How Varnish met CHERI 5/N

Varnish Workspaces

To process a HTTP request or response, varnish must allocate bits of memory which will only be used for the duration of that processing, and all of it can be released back at the same time.

To avoid calling malloc(3) a lot, which comes with a locking overhead in a heavily multithreaded process, but even more to avoid having to keep track of all these allocations in order to be able to free(3) them all, varnish has “workspaces”:

struct ws {
    […]
    char    *s;     /* (S)tart of buffer */
    char    *f;     /* (F)ree/front pointer */
    char    *r;     /* (R)eserved length */
    char    *e;     /* (E)nd of buffer */
};

The s pointer points at the start of a slab of memory, owned exclusively by the current thread and e points to the end.

Initially f is the same as s, but as allocations are made from the workspace, it moves towards e. The r pointer is used to make “reservations”, we will ignore that for now.

Workspaces look easy to create:

ws->s = space;
ws->e = ws->s + len;
ws->f = ws->s;
ws->r = NULL;

… only, given the foot-shooting-abetting nature of the C language, we have bolted on a lot of seat-belts:

#define WS_ID_SIZE 4

struct ws {
    unsigned        magic;
#define WS_MAGIC    0x35fac554
    char            id[WS_ID_SIZE]; /* identity */
    char            *s;             /* (S)tart of buffer */
    char            *f;             /* (F)ree/front pointer */
    char            *r;             /* (R)eserved length */
    char            *e;             /* (E)nd of buffer */
};

void
WS_Init(struct ws *ws, const char *id, void *space, unsigned len)
{
    unsigned l;

    DSLb(DBG_WORKSPACE,
        "WS_Init(%s, %p, %p, %u)", id, ws, space, len);
    assert(space != NULL);
    assert(PAOK(space));
    INIT_OBJ(ws, WS_MAGIC);
    ws->s = space;
    l = PRNDDN(len - 1);
    ws->e = ws->s + l;
    memset(ws->e, WS_REDZONE_END, len - l);
    ws->f = ws->s;
    assert(id[0] & 0x20);           // cheesy islower()
    bstrcpy(ws->id, id);
    WS_Assert(ws);
}

Let me walk you through that:

The DSLb() call can be used to trace all operations on the workspace, so we can see what actually goes on.

(Hint: Your malloc(3) may have something similar, look for utrace in the manual page.)

Next we check the provided space pointer is not NULL, and that it is properly aligned, these are both following a varnish style-pattern, to sprinkle asserts liberally, both as code documentation, but also because it allows the compiler to optimize things better.

The INIT_OBJ() and ``magic field is a style-pattern we use throughout varnish: Each structure is tagged with a unique magic, which can be used to ensure that pointers are what we are told, when they get passed through a void*.

We set the s pointer.

We calculate a length at least one byte shorter than what we were provided, align it, and point e at that.

We fill that extraspace at and past e, with a “canary” to stochastically detect overruns. It catches most but not all overruns.

We set the name of the workspace, ensuring it is not already marked as overflowed.

And finally check that the resulting workspace complies with the defined invariants, as captured in the WS_Assert() function.

With CHERI, it looks like this:

void
WS_Init(struct ws *ws, const char *id, void *space, unsigned len)
{
    unsigned l;

    DSLb(DBG_WORKSPACE,
        "WS_Init(%s, %p, %p, %u)", id, ws, space, len);
    assert(space != NULL);
    INIT_OBJ(ws, WS_MAGIC);
    assert(PAOK(space));
    ws->s = cheri_bounds_set(space, len);
    ws->e = ws->s + len
    ws->f = ws->s;
    assert(id[0] & 0x20);           // cheesy islower()
    bstrcpy(ws->id, id);
    WS_Assert(ws);
}

All the gunk to implement a canary to detect overruns went away, because with CHERI we can restrict the s pointer so writing outside the workspace is by definition impossible, as long as your pointer is derived from s.

Less memory wasted, much stronger check and more readable source-code, what’s not to like ?

When an allocation is made from the workspace, CHERI makes it possible to restrict the returned pointer to just the allocated space:

void *
WS_Alloc(struct ws *ws, unsigned bytes)
{
    char *r;

    […]
    r = ws->f;
    ws->f += bytes;
    return(cheri_bounds_set(r, bytes));
}

Varnish String Buffers

Back in the mists of time, Dag-Erling Smørgrav and I designed a safe string API called sbuf for the FreeBSD kernel.

The basic idea is you set up your buffer, you call functions to stuff text into it, and those functions do all the hard work to ensure you do not overrun the buffer. When the string is complete, you call a function to “finish” the buffer, and if returns a flag which tells you if overrun (or other problems) happened, and then you can get a pointer to the resulting string from another function.

Varnish has adopted sbuf’s under the name vsb. This should really not surprise anybody: Dag-Erling was also involved in the birth of varnish.

It should be obvious that internally vsb almost always operate on a bigger buffer than the result, so this is another obvious place to have CHERI cut a pointer down to size:

char *
VSB_data(const struct vsb *s)
{

    assert_VSB_integrity(s);
    assert_VSB_state(s, VSB_FINISHED);

    return (cheri_bounds_set(s->s_buf, s->s_len + 1));
}

Still no bugs though.

/phk