<div dir="ltr"><div>Thinking out loud.. <br><br>For # 2, what about something like this?<br><br>req.body.data<br>req.body.length<br>req.body.is_binary (Content-Length != strlen)<br><br></div><div>or:<br><br></div><div>req.body.blob<br></div><div>req.body.string<br></div><div>req.body.is_blob<br></div><div><br></div>My reasoning for this is to be able to use existing functions / vmods - I expect the body to be urlencoded most of the time.<br><div>For binary (is_binary) or blob (is_blob) we'll need new functions that take he length, e.g. hash_ndata(req.body.data, req.body.len) or use the blob directly e.g. hash_blob(req.body.blob).<br><br></div><div>That said, this makes the caller responsible for using the right interface so it might not be the right approach. <br>OTOH having a set of special functions to work with the body means we're defining (limiting?) what can be done until we have body aware vmods.<br><br>One way to get away with this, although fugly, could be by changing signatures, restricting arguments in the vcc compiler and making these functions a bit smarter, e.g.:<br><br>hash_data(req.body);<br><br>In this case hash_data() will internally know what (length) to use. This might work in Varnish core but will require specific handling outside though.<br><br>Another alternative would be to not handle binary data at all. req.body will always be non-binary. If you want to handle binary data you will have to use a function to get it.<br>After all we don't currently handle binary data (well, null bytes) and I'm not sure how useful would be outside hashing.<br><br>My 2 cents.<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb 26, 2015 at 9:32 AM, Arianna Aondio <span dir="ltr"><<a href="mailto:arianna.aondio@varnish-software.com" target="_blank">arianna.aondio@varnish-software.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">VDD Hamburg talking point:<br>
<br>
Context:<br>
Starting from Varnish 4 we can buffer the request body (usually POST<br>
and PUT requests) before sending it to the backend.<br>
Now we have just one function accessible to users:<br>
std.cache_req_body(BYTES size) which initializes the buffering.<br>
Once the request body has been cached, it can be consumed as many<br>
times as needed, making it available to other user-accesible<br>
functions, such as:<br>
* request body length access function<br>
* regular expression match on request body<br>
* regular expression substitution on request body<br>
* request body as input in vcl_hash<br>
<br>
Problems:<br>
1. Bug #1664, std.cache_req_body(BYTES size) lacks of errors handling,<br>
if it is called with a request body bigger than size, Varnish crashes<br>
and if we have a chunked request the function will cache every request<br>
bodies ignoring the provided size limitation.<br>
2. Regular expression match on body: how do we want the user interface<br>
to be, do we want the function to return a boolean indicating if the<br>
request body contains the string the user is looking for? In VCL this<br>
can look like :<br>
sub vcl_recv {<br>
set req.http.x-boolean1 = std.regex_req_body("varnish rocks");<br>
}<br>
<br>
Or do we want to be more aligned with the regex syntax and make the<br>
request body completely available to the user? In VCL this can look<br>
like :<br>
sub vcl_recv {<br>
if (std.reqbody_re_match() ~ "varnish rocks") {<br>
....<br>
}<br>
}<br>
<br>
3. Regular expression substitution on body, this function needs to be<br>
discussed. Do we really need to be able to substitute on the request<br>
body? Is it safe? How do we handle the possible increase of request<br>
body?<br>
<br>
Proposed solutions:<br>
1. As decided a couple of weeks ago during a bugwash, we either buffer<br>
the whole request body or fail the request.<br>
I have a patch for this: if the request body is bigger than the given<br>
size, we close the connection and move forward to the next request.<br>
2. && 3. to be discussed.<br>
<br>
Request body length access function: once the request body has been<br>
cached, we can then iterate over it and return the number of bytes.<br>
<br>
Request body as input in vcl_hash: once the request body has been<br>
cached, we can hash on it. This function should be available just in<br>
vcl_hash.<br>
Until now we have always just hashed on strings, but if we want to<br>
hash on bodies we need to be aware that they can be binary, so we need<br>
to handle this properly.<br>
<br>
I think functions regarding request body manipulation should be part<br>
of the std.vmod.<br>
<br>
<br>
General considerations:<br>
Request bodies may contains binary data that headers should not contain.<br>
Functions have to be able to handle any kind of request body.<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Arianna Aondio<br>
Software Developer | Varnish Software AS<br>
Mobile: <a href="tel:%2B47%20980%2062%20619" value="+4798062619">+47 980 62 619</a><br>
<br>
We Make Websites Fly<br>
<a href="http://www.varnish-software.com" target="_blank">www.varnish-software.com</a><br>
<br>
_______________________________________________<br>
varnish-dev mailing list<br>
<a href="mailto:varnish-dev@varnish-cache.org">varnish-dev@varnish-cache.org</a><br>
<a href="https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev" target="_blank">https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev</a><br>
</font></span></blockquote></div><br></div>