Parsing non-English URLs

Angie T. Muhammad angie.tawfik at gmail.com
Tue May 25 12:08:26 CEST 2010


Thank you Sam for your response. I already logged requests to cached Arabic
URLs and here is the result of one request:
===========================================================================================
Cookie: SESScfc90a62c81b7bfc6f292320b1d0b8ca=t7t650vu5qu02916unbtil9o66;
SESS50745c6a3729e7f46278f7d281511580=qjc658f7cthp6dvj65rt6a8c64;
SESS8348e9a0e0f6133hash*%ntrol: max-age=0%c9c2n9td5uuvj0hp73;
SESSb323fb39997d18c5bde4c32f7bc0ffe1=0r5ve4k3i2ubmqu
▒±␊: 0  ┼: ┐␊␊⎻-▒┌␋┴␊     806  
===========================================================================================

I tried opening the log file with less, vim, and tail but all what am
getting is either binary (less) or stuff like above (tail).
I even tried limiting the accepted charset header sent by the browser to
UTF-8 but failed. Here is my config for limiting the charset under sub
vcl_rcv { } :
======================================
  if (req.http.Accept-Charset) {
  remove req.http.Accept-Charset;
  set req.http.Accept-Charset = "utf-8";
  }
======================================

I also tried including C header files as follows:
===================================
C{
#include <string.h>
#include <locale.h>
#include <wctype.h>
#include <wchar.h>
#include <curses.h>
}C
===================================
but it did not give me any result.

I am thinking of recompiling with ncurses wchar enabled. Any ideas?


2010/5/24 Sam Crawford <samcrawford at gmail.com>

> It's not one that I'm familiar with, but if it were me, I'd try
> running varnishlog whilst putting a request for one of these URLs
> through. See how varnish prints it out in the RxURL field. This might
> give you some clues as how to specify it in the rules.
>
> Thanks,
>
> Sam
>
>
> 2010/5/23 Angie T. Muhammad <angie.tawfik at gmail.com>:
> > Hello Varnish team
> >
> > I have varnish v. 2.1.2 on production and test servers . We are running a
> > bilingual news website.
> > On my test server I am trying to parse non-English URLs like follows:
> >
> > .......................
> >   else if (req.url == "/تقارير") {
> >       set beresp.http.X-Cacheable = "Yes";
> >       set beresp.ttl = 60m;
> >       return(deliver);
> >      }
> >  .......................
> >
> > The word in bold red is in Arabic and it is a right-to-left language. The
> > link can not be made in English and has no English equivalent. In case
> you
> > are wondering, the word means "reports". My sole problem now is that
> varnish
> > applies all other if-statements with full English URLs but not this one
> with
> > Arabiv. Even if I try regex say: req.url ~ "^/تقارير" instead of the ==
> > sign, it starts with no errors but does not apply the rule.
> >
> > I tried the following:
> > 1- Reversing the letters of the arabic word, so تقارير  would be ريراقت
> but
> > it did not work
> > 2- Copying the link directly into /etc/varnish/default.vcl, it produces
> > something like: %D9%88%D8%B3%D9%88%D9%85%D8%A7%D8%AA
> >      Such html address handling prevents varnish from starting
> >
> > Any ideas? Your help is really appreciated.
> >
> >
> > --
> > All the best,
> > Angie
> >
> > _______________________________________________
> > varnish-misc mailing list
> > varnish-misc at varnish-cache.org
> > http://lists.varnish-cache.org/mailman/listinfo/varnish-misc
> >
>



-- 
All the best,
Angie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20100525/4307fd81/attachment-0003.html>


More information about the varnish-misc mailing list