blob iterators, body iterators in particular
Nils Goroll
slink at schokola.de
Sat Sep 24 16:41:02 CEST 2016
I've been pondering a bit if we could have a generalized vmod interface to
iterate over blob lists and bodies in particular. Ideally, I'd like to have a
single interface for all of the following pseudo-vcl examples.
It's not that I'd personally need all of these now, the hash(req.body) case is
really the only one I need to get working ASAP (and the plan is to fix the
bodyaccess vmod). But being at it, I couldn't avoid reflecting on how this could
be solved for the general case.
So here's a vcl mock:
vcl_init {
# vmod-re exists
new re_evil = re.regex("SQL.*INJECTION");
}
vcl_recv {
cache_req_body(1MB);
# .match(STRING) exists, .matchb(BODY) doesn't
if (re_evil.match(req.url) || re_evil.matchb(req.body)) {
return (synth(400, "you're evil"));
}
}
vcl_hash {
if (req.method != "GET" && req.method != "HEAD") {
# not possible ATM
hash_blob(req.body);
}
}
vcl_backend_response {
# may be a stupid example, could not come up with anything better
if (beresp.http.Content-Type == "image/png") {
image.recompress(beresp.body);
}
# blobcode/blobdigest exist, but hashing a body is not
# possible
set beresp.http.Etag = blobcode.encode(BASE64,
blobdigest.hashb(MD5, beresp.body));
}
vcl_deliver {
if (req.http.Cookie ~ "loggedin=true") {
if (resp.http.Content-Type == "audio/mp3") {
# another stupid example
mp3.watermark(resp.body, req.http.UserId);
}
}
}
so the VCC declarations for all of the vmod methods/functions could use a common
BLOB_LIST type
# libvmod-re
$Method BOOL .matchb(BLOB_LIST)
# libvmod-image
$Function VOID .recompress(BLOB_LIST)
# libvmod-blobdigest
$Function BLOB hashb(ENUM {MD5, ...}, BLOB_LIST)
# libvmod-mp3
$Function VOID .watermark(BLOB_LIST, STRING)
only one BLOB_LIST argument would be allowed per Function/Method
The VCL/VMOD interface should have an init call, an iterator and a fini call.
The thing passed when iterating could be the existing vmod_priv
struct vmod_priv_iter;
typedef struct vmod_priv *vmod_priv_iter_f(const struct vmod_priv_iter *, const
struct vmod_priv *);
enum vmod_priv_iter_state_e {
VI_INIT,
VI_ITER,
VI_FINI
};
struct vmod_priv_iter {
void *priv;
enum vmod_priv_iter_state_e state;
vmod_priv_iter_f *func;
};
The C type of BLOB_LIST would be struct vmod_priv iter *
the compiled VCL would then:
- alloc the vmod_priv_iter (on the stack?)
- zero it and set state=VI_INIT
- call the vmod function once, ignoring the return value
- the vmod function would alloc/init its priv data and fill
in the priv and func members of the struct vmod_priv_iter
- compiled VCL would set VI_ITER and loop through the object,
calling the vmod_priv_iter_f
- NULL return from iterator means "have not changed"
- otherwise the iterator function MAY modify the object
(if writable form the context) by referencing or copying the
returned vmod_priv or copying/freeing it, as applicable
- compiled VCL would set state=VI_FINI and call the vmod
function the last time, using the return value unless VOID
Regarding the interfaces with varnish core we need to differentiate the use cases:
* vcl_recv {} / vcl_hash {} req.body access
We got this as a storage object, so the iterator would wrap the
vmod iterator in a objiterate_f -> _should_ be easy I think
* vcl_backend_response { }
Trouble here is that we do not have the body, so in principle I see
a couple of options and I am having a hard time making up my mind
which would be best
- early fetch of the body, wrap the vmod iterator in
a vfp (but where in the vfp stack would we put it?)
- early fetch of the body, use objiterate_f when done
Both would disable streaming for anything but a VOID
return of the vmod iterator, the vfp option would allow
to stream for VOID return
* vcl_deliver { }
Here we could use the objiterate_f again, but we would need to
create some dummy OC_F_PRIVATE object, filling in the modified bits.
Nils
More information about the varnish-dev
mailing list