Streaming and backend conditional requests

Thu Jan 12 12:32:02 CET 2012

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hello all,

Yesterday there was a discussion about this subject on the
#varnish-hacking IRC channel, and we decided that a summary should be
sent up to varnish-dev, to ask for feedback. @phk, I hope we can get
your opinions.

For background on backend conditional requests (usually
If-Modified-Since), currently implemented in the branch experimental-ims
of the source repository, look here:

https://www.varnish-cache.org/trac/wiki/BackendConditionalRequests

This implementation never chooses a busy object during hash lookup as
the stale object (the candidate object from cache to be validated with
the conditional request). The reason for this was mainly just to not
change too much about the way Varnish works. If a matching busy object
is found in HSH_Lookup(), then it returns just as it does in the current
master; a stale object is taken into consideration only if that doesn't
happen.

In 3.0 streaming, a streaming object always has the busy flag set; so a
streaming object can never become the stale object in experimental-ims.
As I've understood Martin, this will change in new streaming -- IIUC a
streaming object won't have the busy flag set, but will have a non-NULL
busyobj.

So something will have to change for IMS to work with new streaming. The
question, however, is whether passing over all currently streaming
objects as potential stale_objs for IMS is a good idea.

Suppose a very large object is streaming, which takes so long that its
TTL elapses before streaming is finished. One might say that this is a
poorly configured TTL, but it can nevertheless happen. When a new client
request for this object now arrives, it could conceivably be validated
against the backend as "Not Modified", so that it can be streamed from
Varnish to the second client as well, without having the backend send it
again. Reducing the bandwidth from backends for large objects is, after
all, a main goal of having conditional requests.

experimental-ims as currently implemented couldn't do that. If the
stale_obj is validated by the backend, then:

- - HTTP headers that the stale_obj has but the beresp object created for
the response doesn't have are filtered into the beresp object.

- - Storage for the stale_obj is duped into the storage for the beresp
object.

"Dupe" is currently implemented just by copying the storage, although
the interfaces have been set up so that in the future, something more
efficient could be implemented -- what we've had in mind is that storage
can be shared with pointers by more than one object, and storage objects
have refcounts.

Last year, phk advised us to just implement the storage dupe with
copying for now, and implement something like refcounting in a future
version. As I understood it, this was so that we don't try to change too
many things at once; and I think it wasn't entirely clear how to
implement something like shared storage and refcounting for the
persistent stevedore (I've had a look and it hasn't been clear to me).

So as things are, experimental-ims couldn't try to validate an object
that is currently streaming, with or without the busy flag, because it
has to be able to copy its storage. The questions are:

- - Should backend validation stay this way, or should we try to change it
as suggested above (validate a currently streaming object)? (Opinions in
IRC were that this would definitely be worthwhile.)

- - If so, how should we manage the storage for the stale_obj and the
beresp object? Dupe by copying, as now implemented, would have to be
changed.

An idea on IRC was: rather than introducing refcounting into storage and
sharing it among objects, have the objects point to each other, and make
use of the refcounting for objects that already exists. So the beresp
object, with the updated headers, would have no storage of its own, but
would point to the stale_obj, indicating that its storage is to be found
there. I haven't fully seen through all of this (and may have restated
it incorrectly); but the idea of using a refcounting solution that has
existed and been tested for a long time seems promising.

Opinions, comments? What do you think?

Best,
Geoff
- -- 
** * * UPLEX - Nils Goroll Systemoptimierung

Schwanenwik 24
22087 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (SunOS)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJPDsSyAAoJEOUwvh9pJNUR3ZcP/1kRr4zttzTWoCiOyMShjfs4
IKG42fFhet0rHbQqR8osSkhZndofoaAe/x5rK5ZD7+eACKL0DC+5CcOxbHHFjd0J
6YKoAQQ0p6KNont7LFrKovq4pUUDXIFQ6CovWju/VO6EeRbN2xvjephpS7It9Vhe
5W/ooJSwcP9az4V5kx6nXBq+KW1nOgVS4koW5AtQhuDmKptOjHTX9ZajQN+rGXjv
2Yyt5c/DbRgJLUcWs3q2RVUWB/o2toeyl6K7H34tfEz40tB12+7MGdWzKHQoqUcI
7Y53Gd8/wjQ3BTFcIG59qQT6dcQTxNHXuQXM4vvqOA5+FFsKVqw1kpFAMhdVcTNA
6YOvFeN51y5lUzhxD4Jltt+2Xar1MIaCqsiV48JWerLFxE3QlX53IWdifvIq0Hqi
DU8+QXkaGhg2n4lc+AalXNXbe0iMO5qij935W1sCPGivxXjdsbUaYhooh868jhRS
kVAZ5M1FcW24myo1RkuyQIjhdb1t8+/Dyv4l2fsgo3LMXjvZusA1rnXd14rB1aho
SK1BSpzBQ9ZePNPknj+HSpTTYeffQLknt9ijx7osqwcs+zYLjZJVRPbInVKrB/yN
QsU+cW7Toprfdu4AwsK0MsKqsDwTDvh0Q5wp6cGkGe6me92o2EDd8jEMVp4toDRu
V5KmQhFb2jm37H/kLG3v
=bp9k
-----END PGP SIGNATURE-----