push button lru nuking

Sun Jan 17 01:10:43 CET 2010

On Sat, Jan 16, 2010 at 3:55 PM, Michael Fischer <michael at dynamine.net> wrote:
> This scheme seems very baroque.  Why not just reduce the size of your caches
> so you don't page-thrash and let Varnish's builtin LRU algorithm handle the
> eviction?

Then I wont be able to cache nearly as much.  I want to originate as
much content as possible on the varnish servers ie. reduce backend
fetches.  There is no way I could fit any useful amount of my working
set into a storage that could handle the evictions without spending an
unreasonable amount of money (basically fit it in RAM.)  -I'd love to
be proven wrong though.  As far as random reads go, the SSD's are
really good; it's just the writes that kill me.

Right now a mostly filled cache server with ~80-160GB allocated can
maintain between 90-92% cache hit ratio at 400-500Mb/sec.  When it
fills up completely eviction cause the machine to keel over, parent
can't ping the child, health checks fail -general badness.  I'd like
to let the eviction run under supervision (automated supervision) and
augment the eviction such that it buys back a few hours not minutes.

> --Michael
>
> On Sat, Jan 16, 2010 at 2:46 PM, David Birdsong <david.birdsong at gmail.com>
> wrote:
>>
>> I'm trying to hack my way around a push-button like lru nuking like
>> feature.  The short description of how I'm doing it follows, I'll
>> explain why farther down.
>>
>> I have a job that watches sm_bfree / (sm_bfree + sm_balloc).  Once
>> storage file utilization is past some percentage(yet to be determined)
>> I connect to upstream load balancers and slowly drain traffic away
>> from varnish.
>>
>> Once traffic is off and I can beat the hell out of that box, it's time
>> to free up some space.  In the past this has been done with restarts.
>> Upon restarts, the cache hit ratio is destroyed, but the box can keep
>> up and rebuild the cache in a stable way.  What I'd like to do is dump
>> everything in the storage files that have a very low obj.hits.  Lru
>> nuking on the surface seems like the best thing to initiate, but it
>> usually only kicks in pretty late and puts the machine into a state
>> that is unstable while serving.  While not serving, I don't know how
>> to kick it off, furthermore I want it to run hard and free up lots
>> more space than it usually does.
>>
>> ie.  cache file is ~200GB, I'd like it to run until sm_free is like 50GB.
>>
>> My idea is to load balance as I've described above.   Pull 50GB of
>> trash files through the cache + enough to kick off lru, purge the
>> trash files, monitor sm_bfree and once it's high enough instruct the
>> upstream load balancers to start sending traffic gently for a warm up
>> period.  Rinse and repeat into infinity replacing the ssd storage
>> drives as they fail.  Is this crazy?  Am I uninformed on a better way?
>>
>> Also, I've had to keep making my trash files smaller and smaller.  I
>> started with a 10 and 1G files which crashed varnish immediately, then
>> reduced to 500MB files and successfully pulled 200 through - then
>> crashed both my python interpreter (libcurl) and varnish:
>> varnishd[2664]: Child (14772) Panic message: Assert error in
>> STV_alloc(), stevedore.c line 183:#012  Condition((st) != NULL) not
>> true.#012thread = (cache-worker)#012Backtrace:#012  0x421f95:
>> pan_ic+85#012  0x4369e5: STV_alloc+125#012  0x41a1b6:
>> FetchBody+496#012  0x4114dd: cnt_fetch+63d#012  0x412a3d:
>> CNT_Session+35d#012  0x424273: wrk_do_cnt_sess+93#012  0x42362e:
>> wrk_thread_real+26e#012  0x7f2cf51b83da: _end+7f2cf4b47c1a#012
>> 0x7f2cf4a862bd: _end+7f2cf4415afd#012sp = 0x7f2ced387008 {#012  fd =
>> 58, id = 58, xid = 1454039386,#012  client = 127.0.0.1:7057,#012  step
>> = STP_FETCH,#012  handling = deliver,#012  err_code = 200, err_reason
>> = (null),#012  restarts = 0, esis = 0#012  ws = 0x7f2ced387078 { #012
>>  id = "sess",#012    {s,f,r,e} =
>> {0x7f2ced387800,+144,(nil),+4096},#012  },#012  http[req] = {#012
>> ws = 0x7f2ced387078[sess]#012      "GET",#012
>> "/lru.10.cache.buster.80.12994",#012      "HTTP/1.1",#012
>> "User-Agent: PycURL/7.18.2",#012      "Host: localhost:6081",#012
>> "Accept: */*",#012  },#012  worker = 0x7ef439f06390 {#012    ws =
>> 0x7ef439f068f0 { #012      id = "wrk",#012      {s,f,r,e} =
>> {0x7ef439f03350,+2143,(nil),+4096},#012    },#012    http[bereq] =
>> {#012      ws = 0x7ef439f068f0[wrk]#012        "GET",#012
>> "/lru.10.cache.buster.80.12994",#012        "HTTP/1.1",#012
>> "User-Agent: PycURL/7.18.2",#012        "Host: localhost:6081",#012
>>    "Accept: */*",#012        "X-Varnish: 1454039386",#012
>> "X-Forwarded-For: 127.0.0.1",#012    },#012    http[beresp] = {#012
>>  ws = 0x7ef439f068f0[wrk]#012        "HTTP/1.1",#012
>> "200",#012        "OK",#012        "Server: nginx/0.7.64",#012
>> "Date: Sat, 16 Jan 2010 21:11:09 GMT",#012        "Content-Type:
>> application/octet-stream",#012        "Content-Length: 524288000",#012
>>       "Last-Modified: Sat, 16 Jan 2010 21:08:11 GMT",#012
>> "Connection: keep-alive",#012        "Accept-Ranges: bytes",#012
>>  "X-Varnish-IP: 127.0.0.1",#012        "X-Varnish-Port: 6081",#012
>> },#012    },#012
>>
>> Are big files bad?  I expect that I'll have to close a pretty big gap
>> normally given that my 4 storage files are 75GB each (SSD). I'd like
>> to start this process before lru nuking happens on it's own while
>> varnish is not unloaded by upstream load balancers.  My guess based on
>> loose recollection is that varnish will start lru nuking at 90%
>> capacity.  It may just prove not feasible given that I'll have to pull
>> roughly 60GB through to achieve the goal....perhaps freeing up a
>> smaller percentage would be acceptable too though.  I'm still playing
>> with this, but wanted to share my uber-hacky idea and let you guys
>> tear it apart if it's a dumb idea.
>>
>> Why:
>> Identifying the working set has been difficult.  It's large, the long
>> tail is very long.  I've tried adaptive ttls to expire objects
>> constantly that shouldn't be in cache:
>>
>>  in vcl_fetch: set every new object to a 2hr ttl.
>>  in vcl_hit: if obt.hits == N ; then obj.ttl = 36 hours, where N is
>> some number that is high enough to cache
>> another permutation, update the vcl every 30 mins such that obj.ttl
>> was set to expire exactly at the trough of traffic (2300 - 2350 PST)
>>  in vcl_hit: if obt.hits = N ; then obj.ttl = 12h or 10h, or 3h
>> (depending on time of day)
>>
>> This just ended up affecting cache hit ratio such that it was never
>> favorable and the box was just busier as it was constantly expiring
>> objects over the day. Restarts were still better than this.
>>
>> Setup:
>>
>> 3 haproxy load balancer machines consistently hashing to 6 varnish
>> instances.  It's a prototype and will be scaled to a larger pool, so
>> the impact of the downtime of a single varnish instance while it goes
>> through a cache storage scrubbing is will be greatly reduced.
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at projects.linpro.no
>> http://projects.linpro.no/mailman/listinfo/varnish-misc
>
>