blob iterators, body iterators in particular

Nils Goroll slink at
Sat Sep 24 16:41:02 CEST 2016

I've been pondering a bit if we could have a generalized vmod interface to
iterate over blob lists and bodies in particular. Ideally, I'd like to have a
single interface for all of the following pseudo-vcl examples.

It's not that I'd personally need all of these now, the hash(req.body) case is
really the only one I need to get working ASAP (and the plan is to fix the
bodyaccess vmod). But being at it, I couldn't avoid reflecting on how this could
be solved for the general case.

So here's a vcl mock:

vcl_init {
	# vmod-re exists
	new re_evil = re.regex("SQL.*INJECTION");

vcl_recv {

	# .match(STRING) exists, .matchb(BODY) doesn't
	if (re_evil.match(req.url) || re_evil.matchb(req.body)) {
		return (synth(400, "you're evil"));

vcl_hash {
	if (req.method != "GET" && req.method != "HEAD") {
		# not possible ATM

vcl_backend_response {
	# may be a stupid example, could not come up with anything better
	if (beresp.http.Content-Type == "image/png") {

	# blobcode/blobdigest exist, but hashing a body is not
	# possible
	set beresp.http.Etag = blobcode.encode(BASE64,
				blobdigest.hashb(MD5, beresp.body));

vcl_deliver {
	if (req.http.Cookie ~ "loggedin=true") {
		if (resp.http.Content-Type == "audio/mp3") {
			# another stupid example
			mp3.watermark(resp.body, req.http.UserId);

so the VCC declarations for all of the vmod methods/functions could use a common

# libvmod-re
$Method BOOL .matchb(BLOB_LIST)

# libvmod-image
$Function VOID .recompress(BLOB_LIST)

# libvmod-blobdigest
$Function BLOB hashb(ENUM {MD5, ...}, BLOB_LIST)

# libvmod-mp3
$Function VOID .watermark(BLOB_LIST, STRING)

only one BLOB_LIST argument would be allowed per Function/Method

The VCL/VMOD interface should have an init call, an iterator and a fini call.
The thing passed when iterating could be the existing vmod_priv

struct vmod_priv_iter;
typedef struct vmod_priv *vmod_priv_iter_f(const struct vmod_priv_iter *, const
struct vmod_priv *);

enum vmod_priv_iter_state_e {

struct vmod_priv_iter {
	void				*priv;
	enum vmod_priv_iter_state_e	state;
	vmod_priv_iter_f		*func;

The C type of BLOB_LIST would be struct vmod_priv iter *

the compiled VCL would then:

	- alloc the vmod_priv_iter (on the stack?)
	- zero it and set state=VI_INIT
	- call the vmod function once, ignoring the return value
	  - the vmod function would alloc/init its priv data and fill
	    in the priv and func members of the struct vmod_priv_iter
	- compiled VCL would set VI_ITER and loop through the object,
	  calling the vmod_priv_iter_f
	  - NULL return from iterator means "have not changed"
	  - otherwise the iterator function MAY modify the object
	    (if writable form the context) by referencing or copying the 	
            returned vmod_priv or copying/freeing it, as applicable
	- compiled VCL would set state=VI_FINI and call the vmod
	  function the last time, using the return value unless VOID

Regarding the interfaces with varnish core we need to differentiate the use cases:

* vcl_recv {} / vcl_hash {} req.body access

  We got this as a storage object, so the iterator would wrap the
  vmod iterator in a objiterate_f -> _should_ be easy I think

* vcl_backend_response { }

  Trouble here is that we do not have the body, so in principle I see
  a couple of options and I am having a hard time making up my mind
  which would be best

	- early fetch of the body, wrap the vmod iterator in
	  a vfp (but where in the vfp stack would we put it?)

	- early fetch of the body, use objiterate_f when done

	Both would disable streaming for anything but a VOID
	return of the vmod iterator, the vfp option would allow
	to stream for VOID return

* vcl_deliver { }

  Here we could use the objiterate_f again, but we would need to
  create some dummy OC_F_PRIVATE object, filling in the modified bits.


More information about the varnish-dev mailing list