Understand "hit for pass" cache objects

Mon Feb 15 23:20:24 CET 2010

On 15 February 2010 21:56, Justin Pasher <justinp at newmediagateway.com> wrote:
> Hello,
>
> I have just started using Varnish 2.0.6 in the past week as a
> replacement for Squid. So far, I love the fine grained control you have
> over what goes into cache (as opposed to Squid's "I'll cache it when I
> feel it's supposed to be cached, but not tell you why" approach). That
> said, I'm trying to better understand the "hit for pass" cache objects
> that Varnish will sometimes create. Here is basic flow of my vcl (much
> of it is based on the concepts on the intro page:
> http://varnish-cache.org/wiki/Introduction)
>
> vcl_recv:
> Default action is "lookup". Action changes to "pass" if ...
> * Cache-Control or Pragma headers has "no-cache"
> * HTTP auth is in use (Authorization header)
> * Request contains cookie "bypass_cache=true"
> * Request type is not GET, HEAD, POST, PUT, TRACE, OPTIONS, DELETE
>
> vcl_fetch:
> Default action is "deliver". Action changes to "pass" if ...
> * Response is deemed uncacheable (!obj.cacheable)
> * Response contains Cache-Control headers that say "no-cache"
> * HTTP auth is in use (Authorization header)
> * Request contains cookie "bypass_cache=true"
> * Response contains Set-Cookie header
>
> Now on to the problem at hand. My understanding (please correct any
> errors) of the "hit for pass" object is that any time the action is
> "pass" within vcl_fetch, Varnish will create a "hit for pass" object to
> make future requests for the same URL hash go straight to the back end
> instead of lining them up serially and waiting for a response from the
> first request. Until that object's TTL expires, the "hit for pass"
> object will remain in cache and never be replaced with a fresh object
> from the backend.
>
> Here is what is happening my my example.
>
> Client A visits the URL http://www.example.com/. Since this is the first
> time they visit the site, the backend code tries to start a session (PHP
> code), which sends a Set-Cookie header in the response. In vcl_fetch,
> Varnish sees the Set-Cookie header and issues the "pass" action. Now
> there is a "hit for pass" cache object with a TTL based upon the
> Cache-Control/Expires headers or the default TTL (let's assume 120 seconds).
>
> Client B visit the same URL http://www.example.com/. Varnish finds a
> "hit for pass" object in the cache, so it sends the request directly to
> the backend. This same thing will continue for any future clients until
> 120 seconds have elapsed.
>
> Herein lies my dilemma. A request for the same URL
> (http://www.example.com/) is sometimes cacheable and sometimes not
> cacheable (it usually depends on whether it's the first time a user
> visits the site and the Set-Cookie header has to be sent). What this
> means is if I have a very heavy hit URL as a landing page from Google,
> most of the time there will be a "hit for pass" cache object in Varnish,
> since most people going to that page will have a Set-Cookie header. The
> only time it will cache the page is if I'm lucky and someone visits the
> page while there is no "hit for pass" cache object and their request
> doesn't result in a "pass" action from vcl_fetch.
>
> In my situation, I think I could avoid this problem altogether if I
> could make Varnish store a DIFFERENT set of headers in the cache object
> than the headers return to the client. For example, if I receive a
> response with a Set-Cookie header, I would remove the Set-Cookie header
> from the soon-to-be-cached object (so it wouldn't serve that header up
> for everyone), but LEAVE the Set-Cookie header for the individual that
> made the original request. This would allow the page to cache normally
> even if the only requests going to that page result in a Set-Cookie
> header. However, from what I've been able to see, there is no way to do
> this.

Why not cache the object with the Set-Cookie header in vcl_fetch, then
in vcl_deliver remove the header for users without your cookie,
something like:

    if (req.http.cookie ~ "(^|; )my_cookie=") {
        remove obj.http.Set-Cookie;
    }

Of course, this means that if a user with your cookie is the first to
see the object, a new user may never get sent to the backend and have
the cookie set, but then you can't have that and  also send them
cached responses without re-engineering the way your backend sets
cookies. You may be able to work around that with restarts though.

> Does anyone have any recommendations to get around this? In a perfect
> world, my caching server would work this way:
>
> * vcl_recv: If any criteria from A through D are met, don't pull this
> request from cache and go to the backend
> * vcl_fetch: If any criteria from E through G are met, send the object
> straight to the client without touching the cache.
>
> The "without touching the cache" portion seems to be where I am falling
> down.

You can achieve this using a Vary, I use something similar for serving
cached pages to anonymous users but personalised dynamic pages to
logged in users):

In vcl_recv:

    if (req.http.cookie ~ "(^|; )mycookie=") {
        set req.http.X-My-Cookie = "true";
    }

in vcl_fetch:

    set obj.http.Vary = "X-My-Cookie";
    if (req.http.X-My-Cookie) {
        pass;
    }

Laurence