is 2.0.2 not as efficient as 1.1.2 was?
Barry Abrahamson
barry at automattic.com
Wed Feb 4 16:44:29 CET 2009
On Nov 25, 2008, at 5:37 PM, Demitrious Kelly wrote:
> Hello,
>
> We run Gravatar.com and use varnish to cache avatar responses. There
> are a ton of very small objects and lots of requests per second. Last
> week we were using 1.1.2 compiled against tcmalloc (-t 600 -w 1,4000,5
> -h classic,500009 -p thread_pools 10 -p listen_depth 4096 -s
> malloc,16G). This used an nginx load balancer on a separate host as
> its
> back end which distributed varnish's requests to our pool of webs.
> All
> was well.
>
> This week we upgraded to 2.0.2 and are using varnish's back end &
> director configuration for the same work. What we are seeing is that
> 2.0.2 holds about 60% of the objects in the same amount of cache space
> as 1.1.2 did (we tried tcmalloc, jemalloc, and mmap.) This caused us
> quite a few problems after the upgrade as varnish would start spiking
> the load on the boxes into the hundreds. We attempted tuning the
> lru_interval (up) and obj_workspace (down) but we couldn't get varnish
> to hold the same data that it used to on the same machines.
>
> Right now we've reduced the time that we keep cached objects
> drastically, bringing our cache hit rate down to 92% from 96% which
> roughly doubled the requests (and load) on the web servers. It is,
> however, stable at this point. Obviously the idea of not keeping up
> with the latest versions of varnish is not what we want to do, however
> effectively doubling requirements for scaling the service is just as
> unappealing.
>
> So, what we're asking is... how do we get varnish 2 to be as efficient
> as varnish 1 was? We're glad to try things... It takes a while to
> fill
> up the cache to the point that it can cause problems so testing and
> reporting back will take some time, but we'd like this fixed and will
> put in some work. We're currently running the following cli options:
>
> -a 0.0.0.0:80 -f ... -P ... -T 10.1.94.43:6969 -t 600 -w 1,4000,5 -h
> classic,500009 -p thread_pools 10 -p listen_depth 4096 -s malloc,16G
>
> And our VCL looks like this (with most of the webs taken out for
> brevity
> since they're repeated verbatim with only numbers changed)
>
> backend web11 { .host = "xxx"; .port = "8088"; .probe =
> { .url = "xxx"; .timeout = 50 ms; .interval = 5s;
> .window = 2; .threshold = 1; }
> }
> backend web12 { .host = "xxx"; .port = "8088"; .probe =
> { .url = "xxx"; .timeout = 50 ms; .interval = 5s;
> .window = 2; .threshold = 1; }
> }
>
> director default random {
> .retries = 3;
> { .backend = web11; .weight = 1; }
> { .backend = web12; .weight = 1; }
> }
>
> sub vcl_recv {
> set req.backend = default;
> set req.grace = 30s;
> if ( req.url ~ "^/(avatar|userimage)" && req.http.cookie ) {
> lookup;
> }
> }
>
> sub vcl_fetch {
> if (obj.ttl < 600s) {
> set obj.ttl = 600s;
> }
> if (obj.status == 404) {
> set obj.ttl = 30s;
> }
> if (obj.status == 500 || obj.status == 503 ) {
> pass;
> }
> set obj.grace = 30s;
> deliver;
> }
>
> sub vcl_deliver {
> remove resp.http.Expires;
> remove resp.http.Cache-Control;
> set resp.http.Cache-Control = "public, max-age=600, proxy-
> revalidate";
> deliver;
> }
Bump :) Is anyone else seeing the same thing? I think it may be a
result of the fact that a lot of the cached responses are just headers
(302 redirects) and don't have any actual content. That is the only
thing I can think of why we would be seeing this issue and others
wouldn't. I suspect most people using varnish dont have stats that
look like this:
10094887744 960644.65 847668.80 Total header bytes
22230934332 2174908.58 1866733.93 Total body bytes
I don't really want to revert to 1.1.2 because I like the general
stability and features of 2.x, but I don't have any real ideas on how
to troubleshoot why this would be happening. Any ideas would be
appreciated.
--
Barry Abrahamson | Systems Wrangler | Automattic
Blog: http://barry.wordpress.com
More information about the varnish-misc
mailing list