Handling of Urlencoded string in URL in Varnish

Mon Apr 7 16:04:21 CEST 2014

Hi

On 7 Apr 2014, at 10.25, Per Buer <perbu at varnish-software.com> wrote:

> Hi Jason.
> 
> Docwilco from Fastly has written an URL encoder/decorder VMOD that you can use. You could run it through it twice or patch it do uppcase/lowercase the encoding.
> 
> https://www.varnish-cache.org/vmod/url-code
> 
> Varnish itself doesn't try to interpret the URL much.
> 
> Per.

Thanks Per, that looks great!

Would you agree this would be better resolved in varnish itself?
It looks as though in default VCL it uses hash_data(req.url) - but I question the intension. If the intension is to cache distinct URLs then it needs to use hash_data(urldecode.from.core(req.url)) or hash_data(req.urldecoded).
In using hash_data(req.url) it appears to say that it wants to cache distinct binary representations of a URL, which to be is not the intention.

For reference, RFC3986 (I don't know if this means much though) it says in 2.1:

> The uppercase hexadecimal digits 'A' through 'F' are equivalent to
>    the lowercase digits 'a' through 'f', respectively.  If two URIs
>    differ only in the case of hexadecimal digits used in percent-encoded
>    octets, they are equivalent.  For consistency, URI producers and
>    normalizers should use uppercase hexadecimal digits for all percent-
>    encodings.

So I wonder if really the varnish core should decode before it hashes?
I guess this is a very edge scenario though so not likely to be touched since it only affects outside latin characters and most places will use a more friendly URL or latin characters where it doesn't have any issue.
What are your thoughts? Do you think it worth raising or best to just leave it be and work around it elsewhere?

Regards,

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20140407/7c39867c/attachment.html>