How Varnish met CHERI 5/N¶
Varnish Workspaces¶
To process a HTTP request or response, varnish must allocate bits of memory which will only be used for the duration of that processing, and all of it can be released back at the same time.
To avoid calling malloc(3)
a lot, which comes with a locking
overhead in a heavily multithreaded process, but even more to
avoid having to keep track of all these allocations in order to be able
to free(3)
them all, varnish has “workspaces”:
struct ws {
[…]
char *s; /* (S)tart of buffer */
char *f; /* (F)ree/front pointer */
char *r; /* (R)eserved length */
char *e; /* (E)nd of buffer */
};
The s
pointer points at the start of a slab of memory, owned
exclusively by the current thread and e
points to the end.
Initially f
is the same as s
, but as allocations are made
from the workspace, it moves towards e
. The r
pointer is
used to make “reservations”, we will ignore that for now.
Workspaces look easy to create:
ws->s = space;
ws->e = ws->s + len;
ws->f = ws->s;
ws->r = NULL;
… only, given the foot-shooting-abetting nature of the C language, we have bolted on a lot of seat-belts:
#define WS_ID_SIZE 4
struct ws {
unsigned magic;
#define WS_MAGIC 0x35fac554
char id[WS_ID_SIZE]; /* identity */
char *s; /* (S)tart of buffer */
char *f; /* (F)ree/front pointer */
char *r; /* (R)eserved length */
char *e; /* (E)nd of buffer */
};
void
WS_Init(struct ws *ws, const char *id, void *space, unsigned len)
{
unsigned l;
DSLb(DBG_WORKSPACE,
"WS_Init(%s, %p, %p, %u)", id, ws, space, len);
assert(space != NULL);
assert(PAOK(space));
INIT_OBJ(ws, WS_MAGIC);
ws->s = space;
l = PRNDDN(len - 1);
ws->e = ws->s + l;
memset(ws->e, WS_REDZONE_END, len - l);
ws->f = ws->s;
assert(id[0] & 0x20); // cheesy islower()
bstrcpy(ws->id, id);
WS_Assert(ws);
}
Let me walk you through that:
The DSLb()
call can be used to trace all operations on the
workspace, so we can see what actually goes on.
(Hint: Your malloc(3)
may have something similar,
look for utrace
in the manual page.)
Next we check the provided space pointer is not NULL, and that it is properly aligned, these are both following a varnish style-pattern, to sprinkle asserts liberally, both as code documentation, but also because it allows the compiler to optimize things better.
The INIT_OBJ() and ``magic
field is a style-pattern
we use throughout varnish: Each structure is tagged with
a unique magic, which can be used to ensure that pointers
are what we are told, when they get passed through a void*
.
We set the s
pointer.
We calculate a length at least one byte shorter than what
we were provided, align it, and point e
at that.
We fill that extraspace at and past e
, with a “canary” to
stochastically detect overruns. It catches most but not
all overruns.
We set the name of the workspace, ensuring it is not already marked as overflowed.
And finally check that the resulting workspace complies with
the defined invariants, as captured in the WS_Assert()
function.
With CHERI, it looks like this:
void
WS_Init(struct ws *ws, const char *id, void *space, unsigned len)
{
unsigned l;
DSLb(DBG_WORKSPACE,
"WS_Init(%s, %p, %p, %u)", id, ws, space, len);
assert(space != NULL);
INIT_OBJ(ws, WS_MAGIC);
assert(PAOK(space));
ws->s = cheri_bounds_set(space, len);
ws->e = ws->s + len
ws->f = ws->s;
assert(id[0] & 0x20); // cheesy islower()
bstrcpy(ws->id, id);
WS_Assert(ws);
}
All the gunk to implement a canary to detect overruns went
away, because with CHERI we can restrict the s
pointer so writing
outside the workspace is by definition impossible, as long as your
pointer is derived from s
.
Less memory wasted, much stronger check and more readable source-code, what’s not to like ?
When an allocation is made from the workspace, CHERI makes it possible to restrict the returned pointer to just the allocated space:
void *
WS_Alloc(struct ws *ws, unsigned bytes)
{
char *r;
[…]
r = ws->f;
ws->f += bytes;
return(cheri_bounds_set(r, bytes));
}
Varnish String Buffers¶
Back in the mists of time, Dag-Erling Smørgrav and I designed a
safe string API called sbuf
for the FreeBSD kernel.
The basic idea is you set up your buffer, you call functions to stuff text into it, and those functions do all the hard work to ensure you do not overrun the buffer. When the string is complete, you call a function to “finish” the buffer, and if returns a flag which tells you if overrun (or other problems) happened, and then you can get a pointer to the resulting string from another function.
Varnish has adopted sbuf’s under the name vsb
. This should
really not surprise anybody: Dag-Erling was also involved
in the birth of varnish.
It should be obvious that internally vsb
almost always operate
on a bigger buffer than the result, so this is another obvious
place to have CHERI cut a pointer down to size:
char *
VSB_data(const struct vsb *s)
{
assert_VSB_integrity(s);
assert_VSB_state(s, VSB_FINISHED);
return (cheri_bounds_set(s->s_buf, s->s_len + 1));
}
Still no bugs though.
/phk