Varnish

Wed Sep 5 10:48:42 CEST 2007

Poul-Henning Kamp wrote:
>> If we want nice, pretty error messages, are we 
>> >basically on our own, or is there an imminent plan for this?
> 
> I belive its on our list somewhere.

Would it be possible to have an "error server" and then hack up some VCL 
to "pass through" various errors to that server?  I.e, if the cache is 
about to through back a 502 Bad Gateway, have some VCL that sets the 
backend to pass from:

http://errors.example.com/err502?h=www.example.org&ip=10.11.12.13

(Where errors.example.com is the "error server," www.example.org is the 
site the person has trying to reach, and 10.11.12.13 is the client IP?)

> First of all, VCL is very efficient, so even very large maps in VCL
> code will do well.
> 
> VCL also has an "include" facility, so you could machinegenerate
> that part of your VCL program from your database.

The list of names is not static; dozens of names are added or removed 
throughout a given day, and the overhead of generating/loading a massive 
static configuration file on every change would probably be prohibitive.

> Anyhow, what exactly is "extremely large" in this context ?

Between 10^4 and 10^5 active hostname->backend map entries.

Calling out to an external routine for the map lookup sounds inefficient 
but with the necessary hashing and the ability to do nonintrusive 
dynamic updates, it's an overall win.

Since Varnish already generating and loading dynamic objects, would it 
be possible to add support for dynamically extending VCL at runtime?

For example, we could add do something like:

  req.backend.host = my_map_lookup(http.header.host)

And then provide a C function that implements my_map_lookup() to do what 
we need?  (Including provide an "error host" for invalid host values.)

That would be pretty cool, and the ability to write arbitrary extension 
functions in C would most likely lend itself to all sorts of other 
creative uses.

Unfortunately I highly doubt I'm qualified to do that.  When it comes to 
extending scripting languages, I either reach for Swig or I give up. :)

> Did you miss the NCSA format writer ?

Apparently I missed it completely.  The "combined" format does toss some 
highly useful info when used with caches, particularly whether an object 
was served from cache or origin, but I presume the varnishncsa util 
could fairly readily serve as the template for us to whip up a similar 
util that outputs in the format we need.  Thanks for the pointer.

(Squid gets a lot of things wrong, but fair's fair, they get a lot of 
things right and their log format is one of the latter.)

>> 5) We need to If-Modified-Since: revalidate back to the origin server on 
>> every request, [...]
> 
> Our design assumption was that you would want to keep your backend
> as much out of the loop as possible and use the varnish logfiles
> for your traffic analysis.

I have no doubt that works very well when frontending one site. 
However, it does not scale well in our environment.

Even so, parsing one aggregate stream of Varnish log data into N 
individual streams (one for each backend) would probably be doable.

Parsing M streams from a load-balanced cluster of Varnish servers into N 
time-collated streams in something approaching real time, however, 
sounds like a hard problem. :-)

But if you can think of a viable way to do that, it would no doubt be 
much faster than (and therefore superior to) the If-Modified-Since: 
approach.

Thanks for the response!  Sorry if my questions seem naive; I don't 
really understand VCL yet, so I'm making the classic assumption that 
everything I don't understand is easy. :-)

Jeff