My random thoughts

Fri Feb 10 19:42:24 CET 2006

In message <ujrhd772ii0.fsf at cat.linpro.no>, Dag-Erling =?iso-8859-1?q?Sm=F8rgra
v?= writes:
>Poul-Henning Kamp <phk at phk.freebsd.dk> writes:

>> In both cases, it would be ideal if all that is necessary to tell
>> Varnish are two pieces of information:
>>
>> 	Storage location
>> 		Alternatively we can offer an "auto" setting that makes
>> 		Varnish discover what is available and use what it find.
>
>I want Varnish to support multiple storage backends:
>
> - quick and dirty squid-like hashed directories, to begin with

That's actually slow and dirty.  So I'd prefer to wait with this
one until we know we need it (ie: persistance).

> - fancy block storage straight to disk (or to a large preallocated
>   file) like you suggested

This is actually the simpler one to implement: make one file,
mmap it, sendfile from it.

I don't see any advantage to memcached right off the bat, but I
may become wiser later on.

Memcached is intended for when your app needs a shared memory
interface, which is then simulated using network.

Our app is network oriented and we know a lot more about or
data than memcached would, so we can do the networking more
efficiently ourselves.

>> By far the easiest thing to do is to disregard the cache, that saves
>> a lot of code for locating and validating the contents, but this
>> carries a penalty in backend or cluster fetches whenever a node
>> comes up.  Lets call this the "transient cache model"
>
>Another issue is that a persistent cache must store both data and
>metadata on disk, rather than just store data on disk and metadata in
>memory.  This complicates not only the logic but also the storage
>format.

Yes, although we can get pretty far with mmap on this too.

>> It is a very good question how big a fraction of the persistent
>> cache would be usable after typical downtimes:
>>
>> 	After a Varnish process restart:  Nearly all.
>>
>> 	After a power-failure ?  Probably at least half, but probably
>> 	not the half that contains the most busy pages.
>
>When using direct-to-disk storage, we can (fairly) easily design the
>storage format in such a way that updates are atomic, and make liberal
>use of fsync() or similar to ensure (to the extent possible) that the
>cache is in a consistent state after a power failure.

I meant "usable" as in "will be asked for", ie: usable for improving
the hitrate.

>How about this: we start with the transient model, and add persistence
>later.

My idea exactly :-)

Since I expect the storage to be pluggable, this should be pretty
straightforward.

>> If all machines in the cluster have sufficient cache capacity, the
>> other remaining argument is backend offloading, that would likely
>> be better mitigated by implementing a 1:10 style two-layer cluster
>> with the second level node possibly having twice the storage of
>> the front row nodes.
>
>Multiple cache layers may give rise to undesirable and possibly
>unpredictable interaction (compare this to tunneling TCP/IP over TCP,
>with both TCP layers battling each other's congestion control)

I doubt it.  The front end Varnish fetches from the backend
into its store and from there another thread will serve the
users, so the two TCP connections are not interacting directly.

>Or we can just ignore queries for documents which we don't have; the
>requesting node will have a simply request the document from the
>backend if no reply arrives within a short timeout (~1s).

I want to avoid any kind of timeouts like that.  One slight bulge
in your load and everybody times out and hits the backend.

>Unfortunately, PGP is very slow, so it should only be used to
>communicate with some kind of configuration server, not with the cache
>itself.

Absolutely.  My plan wast to have the "management process" do that.

>unlike regexps, globs can be evaluated very efficiently.

But more efficiently still if compiled into C code.

>> It makes a lot of sense to not actually implement this in the main
>> Varnish process, but rather supply a template perl or python script
>> that primes the cache by requesting the objects through Varnish.

>This can easily be done with existing software like w3mir.
>[...]
>You can probably do this in ~50 lines of Perl using Net::HTTP.

Sounds like you just won this bite :-)

>Distributed lock managers are *hard*...

Nobody is talking about distributed lock managers.  The shared
memory is strictly local to the machine and r/o by everybody else
than the main Varnish process.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.