Handling of cache-control
Rob S
rtshilston at gmail.com
Tue Jan 19 21:31:14 CET 2010
Michael Fischer wrote:
> On Mon, Jan 18, 2010 at 4:37 PM, Poul-Henning Kamp <phk at phk.freebsd.dk
> <mailto:phk at phk.freebsd.dk>> wrote:
>
> In message <DE028C9E-4618-4EBC-8477-6E308753CBCE at dynamine.net
> <mailto:DE028C9E-4618-4EBC-8477-6E308753CBCE at dynamine.net>>,
> "Michael S. Fis
> cher" writes:
> >On Jan 18, 2010, at 5:20 AM, Tollef Fog Heen wrote:
>
> >> My suggestion is to also look at Cache-control: no-cache,
> possibly also
> >> private and no-store and obey those.
> >
> >Why wasn't it doing it all along?
>
> Because we wanted to give the backend a chance to tell Varnish one
> thing with respect to caching, and the client another.
>
> I'm not saying we hit the right decision, and welcome any consistent,
> easily explainable policy you guys can agree on.
>
>
> Well, the problem is that application engineers who understand what
> that header does have a reasonable expectation that the caches will
> obey them, and so I think Vanish should honor them as Squid does.
> Otherwise surprising results will occur when the caching platform is
> changed.
>
> Cache-Control: private certainly meets the goal you stated, at least
> insofar as making Varnish behave differently than the client -- it
> states that the client can cache, but Varnish (as an intermediate
> cache) cannot.
>
> I assume, however, that some engineers want a way to do the opposite -
> to inform Varnish that it can cache, but inform the client that it
> cannot. Ordinarily I'd think this is not a very good idea, since you
> almost always want to keep the cached copy as close to the user as
> possible. But I guess there are some circumstances where an engineer
> would want to preload a cache with prerendered data that is expensive
> to generate, and, also asynchronously force updates by flushing stale
> objects with a PURGE or equivalent. In that case the cache TTL would
> be very high, but not necessarily meaningful.
>
> I'm not sure it makes sense to extend the Cache-Control: header here,
> because there could be secondary intermediate caches downstream that
> are not under the engineer's control; so we need a way to inform only
> authorized intermediate caches that they should cache the response
> with the specified TTL.
>
> One way I've seen to accomplish this goal is to inject a custom header
> in the response, but we need to ensure it is either encrypted (so that
> non-authorized caches can't see it -- but this could be costly in
> terms of CPU) or removed by the last authorized intermediate cache as
> the response is passed back downstream.
>
> --Michael
Michael,
You've obviously got some strong views about varnish, as we've all seen
from the mailing list over the past few days!
When we deployed varnish, we did so in front of applications that
weren't prepared to have a cache in front of them. Accordingly, we
disabled all caching on HTML and RSS type content in Varnish, and
instead just cached CSS / JS / images. This was a good outcome because
we could stop using round robin DNS (which is a bit questionable, imho,
if it includes more than two or three hosts) to the web servers, and
instead just point 2 A records at Varnish. We elected to use
X-External-Cache-Control AND X-Internal-TTL as a headers that we'd set
in Varnish-aware applications. So, old apps that emit cache-control
headers are completely uncached by Varnish), and new-apps can benefit to
a certain degree of caching by Varnish.
PHK's plans for 2010 will enable us to fully exploit our X-Internal-TTL
headers because it'll be able to parse TTL values out of headers. In
the meantime, these are hard-set in Varnish to a value that's
appropriate for our apps.
The X-External-Cache-Control is then presented as Cache-Control to
public HTTP requests.
This describes how we've chosen to deploy varnish, without causing our
application developers huge headaches. In parallel, we've changed many
of our sites to use local cookies+javascript to add personalisation to
the most popular pages. Overall, deploying Varnish has seen a big
reduction in back end requests, PLUS the ability to load balance over a
large pool whilst still implementing sticky-sessions where our apps
still need them. Varnish is, as the name suggests, a lovely layer in
front of our platform which makes it perform better.
Now, to answer your points:
1) Application developers to be aware of caching headers: I'd disagree
here. Our approach is to use code libraries to deliver functionality to
the developers which the sysadmins can maintain. There's always some
overlap here, but we're comfortable with our position. We're a PHP
company, and so we've a class that's used statically, with methods such
as Cacheability::noCache(), Cacheability::setExternalExpiryTime($secs),
and Cacheability::setInternalExpiryTime($secs), as well as
Cacheability::purgeCache($path). Just as, I'm sure, your developers are
using abstraction layers for database access, then they could use a
similar approach for cacheability.
2) Preloading the cache: This is something we do. We set
InternalExpiryTime to be high, and ExternalExpiryTime to be very low.
Then, when there's a change, the app calls a purge.
3) Downstream caches: You either have to decide if the caches are under
your control, or are public. You should make the edge of your estate
behave as you want, and let third parties worry about themselves. Get
your outer most caches to strip all headers other than those you want
retained.
In summary, I think you need to partition what's done by your sysadmins
and what's the job of your developers. I also think it'd help me (and
probably the mailing list) if you could give a little more detail about
the site(s) you're running behind Varnish, and your main troubles are
with your architecture / why you thought Varnish would help. (For the
information of others, we're predominantly using Varnish to balance
traffic between a pool of servers that deliver a news website. Combined
with memcache and gluster, Varnish works well as a frontend to the estate.
Rob
More information about the varnish-misc
mailing list