Regular expressions and trailing slashes

Marcus Smith marcussmith at britarch.ac.uk
Mon Oct 6 22:11:00 CEST 2008


I have Varnish (the Beta 2 release) sitting in front of two web servers.
 Some parts of the site live on one of the servers, the rest on the
other, so my VCL code looks like this:


backend www1 {
    .host = "192.168.100.1";
    .port = "8080";
}

backend www2 {
    .host = "192.168.100.2";
    .port = "8080";
}

sub vcl_recv {
    if (req.http.host ~ "^(www.)?example.(net|org|com)$") {
        /* Normalise domain names. */
        set req.http.host = "www.example.com";


        if (req.url ~ "^/foo/.*$" ||
            req.url ~ "^/bar/.*$" ||
            req.url ~ "^/baz/.*$" ) {
            set req.backend = www1;
        }
        else {
            set req.backend = www2;
        }
    }
    else {
        error 404 "Unknown virtual host!";
    }
}



Unfortunately I can only seem to make this work correctly if the URL
used to request "/foo", "/bar" etc. contains two slashes.  The regexes
above match "/foo/", "/foo/index.html", "/foo/bar/" etc, but *not*
"/foo" (no trailing slash).  I want them to match "/foo" as well... but
*not* things like "/foobar/" (first three letters in common).  I don't
seem to be able to get Varnish to do this.

I thought that simply changing the pattern to "^/foo(/.*)?$" would do
what I wanted, but it does not match.  That pattern *does* match as
intended in Apache though, with e.g. mod_rewrite directives.
Surprisingly, removing the trailing slash from the pattern does not work
either: "^/foo.*$".  Instead, in both cases, Varnish redirects to
"http://www.example.com:8080/foo/" on the www2 backend, which itself
then returns a 404: not only does it fail, but it fails in an unexpected
manner.

I have also tried "^/foo(/(.*))?$" and "^/foo(/(.)*)?$", more out of
curiosity than any expectation that extra parentheses might
significantly alter the behaviour (they don't).

It seems pretty clear that I must be missing something here... do
regexes in VCL take "." to mean something other than "match any one
character"?  The way it is used in the example code for normalising
domains (both above and in the FAQ) suggests to me that they might.
What is the correct way to match "/foo" and any sub-files and
sub-folders of "/foo" ("/foo/", "/foo/bar", "/foo/index.html" etc) but
*not* "/foobar"?

Many thanks in advance,
Marcus

-- 
Marcus Smith
Information Officer
The Council for British Archaeology





More information about the varnish-misc mailing list