Ticket #541 (new enhancement)

Opened 12 months ago

Suggested VCL for cross domain XMLHttpRequest using Varnish

Reported by: ned14 Owned by:
Priority: low Milestone: Varnish 2.1 release
Component: documentation Version: trunk
Severity: minor Keywords:
Cc:

Description

Following on from  http://varnish.projects.linpro.no/ticket/536, here is a suggestion for how to have varnish cache the proxying of another website such that AJAX code can perform cross domain XMLHttpRequests without running into browser security issues. In other words, this is how to make a third party website appear like it is part of your own website using URL rewriting.

Normally speaking one configures Apache or whatever your front end web server is to do the URL rewriting and proxying. However having varnish do it instead has one massive benefit: you can have varnish cache the results such that load on the third party server is greatly reduced.

Firstly, add a backend:

backend repec {
        .host = "ideas.repec.org";
        .port = "80";
}

This is ideas.repec.org which is an index of Economics publicatons, so one can pull the list of all Economics academic publications for a given author by pulling a magic URL like  http://ideas.repec.org/cgi-bin/authorref.cgi?handle=pdo206&output=0.

In sub vcl_recv you want something like this at the start:

sub vcl_recv {
        /*set req.grace = 20s;*/ /* Only enable if you don't mind slightly stale content */

        /* Rewrite all requests to /repec/cgi-bin/authorref.cgi to http://ideas.repec.org/cgi-bin/authorref.cgi */
        if (req.url ~ "^/repec/cgi-bin/authorref.cgi") {
                set req.http.host = "ideas.repec.org";
                set req.url = regsub(req.url, "^/repec", "");
                set req.backend = repec;
                remove req.http.Cookie;
                lookup;
        } else {
                set req.backend = default;
                ... do normal processing ...

And finally in sub vcl_fetch:

sub vcl_fetch {
        /*set req.grace = 20s;*/ /* Only enable if you don't mind slightly stale content */
        if (req.http.host == "ideas.repec.org") {
                set obj.http.Content-Type = "text/html; charset=utf-8"; /* Correct the wrong response */
                set obj.ttl = 86400s;
                set obj.http.Cache-Control = "max-age=3600";
                deliver;
        }

What this does is to firstly correct the wrong MIME type returned by the RePEc server - it says text/plain and iso-8859-1. It then keeps it in the varnish cache for 1 day such that the RePEc server will only ever be asked once per day per author. It then tells the web browser and any intermediate caches to not bother varnish for one hour after a fetch.

Ideally I'd like to have set an Expires: header but I am not entirely sure how to compute one of these in VCL. I suppose one could overwrite the max-age in vcl_hit by subtracting the Age header returned by varnish when it fetches from cache from 86400. Anyway a one hour browser cache expiry is good enough for most cases when someone is casually browsing a website.

I hope that someone finds this useful - I certainly have.

Cheers,
Niall

Note: See TracTickets for help on using tickets.