Handling of Urlencoded string in URL in Varnish

Jason Woods devel at jasonwoods.me.uk
Mon Apr 7 11:01:36 CEST 2014


Hi all,

I apologise if this is the wrong place for a general question / request for advice about Varnish. If there's a better place, please do let me know!

I have a URL (that I've modified) that has an url encoded sequence, e.g. /block/jason%E2%80%99s-first-blog
That sequence is a fancy apostrophe. It must have snuck into the title through a copy and paste or something, but it brought up some caching issue with the encoding.

We notice that when we printed the URL links in HTML, it was printed as above, %E2%80%99. But when some browsers appear to request the URL, it seems to lowercase the encoding to %e2%80%99. So we ended up with two instances of this page cached. Presumedly the browser decoded the URL and then re-encoded it for the request.
This meant that when we modified the blog post we sent a PURGE for %E2%80%99 - but it only updated one instance of the cache, and there were many people seeing the old version. We ended up having to manually PURGE the other cache.

Would I be right in saying that the following two URL are identical, and should have one instance of cache? I'm not sure if this is accounted for in RFC/ISO of anything. URLs are generally case-sensitive IIRC but is it true also for encodings?
/block/jason%E2%80%99s-first-blog
/block/jason%e2%80%99s-first-blog

Is there a way for Varnish to decode the URL before hashing for the cache? Or is there a better approach?
We are using varnish-3.0.3 revision 9e6a70f, which may be quite old, maybe this is fixed in a recent version?

Thanks!

Jason




More information about the varnish-misc mailing list