Is anyone using ESI with a lot of traffic?

Mon Mar 2 22:33:01 CET 2009

I believe TCP Keep-alive has been supported in HAProxy since version
1.2. We've been using 1.3.x for at least a year.

-cloude

On Mon, Mar 2, 2009 at 1:10 PM, Artur Bergman <sky at crucially.net> wrote:
> HAProxy doesn't do keep-alive, so it makes everything slower.
>
> Artur
>
> On Feb 27, 2009, at 9:02 PM, Cloude Porteus wrote:
>
>> John,
>> Thanks so much for the info, that's a huge help for us!!!
>>
>> I love HAProxy and Willy has been awesome to us. We run everything
>> through it, since it's really easy to monitor and also easy to debug
>> where the lag is when something in the chain is not responding fast
>> enough. It's been rock solid for us.
>>
>> The nice part for us is that we can use it as a content switcher to
>> send all /xxx traffic or certain user-agent traffic to different
>> backends.
>>
>> best,
>> cloude
>>
>> On Fri, Feb 27, 2009 at 2:24 PM, John Adams <jna at twitter.com> wrote:
>>>
>>> cc'ing the varnish dev list for comments...
>>> On Feb 27, 2009, at 1:33 PM, Cloude Porteus wrote:
>>>
>>> John,
>>> Goodto hear from you. You must be slammed at Twitter. I'm happy to
>>> hear that ESI is holding up for you. It's been in my backlog since you
>>> mentioned it to me pre-Twitter.
>>>
>>> Any performance info would be great.
>>>
>>>
>>> Any comments on our setup are welcome. You may also choose to call us
>>> crazypants. Many, many thanks to Artur Bergman of Wikia for helping us
>>> get
>>> this configuration straightened out.
>>> Right now, we're running varnish (on search) in a bit of a non-standard
>>> way.
>>> We plan to use it in the normal fashion (varnish to Internet, nothing
>>> inbetween) on our API at some point. We're running version 2.0.2, no
>>> patches. Cache hit rates range from 10% to 30%, or higher when a
>>> real-time
>>> event is flooding search.
>>> 2.0.2 is quite stable for us, with the occasional child death here and
>>> there
>>> when we get massive headers coming in that flood sess_workspace. I hear
>>> this
>>> is fixed in 2.0.3, but haven't had time to try it yet.
>>> We have a number of search boxes, and each search box has an apache
>>> instance
>>> on it, and varnish instance. We plan to merge the varnish instances at
>>> some
>>> point, but we use very low TTLs (Twitter is the real time web!) and don't
>>> see much of a savings by running less of them.
>>> We do:
>>> Apache --> Varnish --> Apache -> Mongrels
>>> Apaches are using mod_proxy_balancer. The front end apache is there
>>> because
>>> we've long had a fear that Varnish would crash on us, which it did many
>>> times prior to our figuring out the proper parameters for startup. We
>>> have
>>> two entries in that balancer. Either the request goes to varnish, or, if
>>> varnish bombs out, it goes directly to the mongrel.
>>> We do this, because we need a load balancing algorithm that varnish
>>> doesn't
>>> support, called bybusiness. Without bybusiness, varnish tries to direct
>>> requests to Mongrels that are busy, and requests end up in the listen
>>> queue.
>>> that adds ~100-150mS to load times, and that's no good for our desired
>>> service times of 200-250mS (or less.)
>>> We'd be so happy if someone put bybusiness into Varnish's backend load
>>> balancing, but it's not there yet.
>>> We also know that taking the extra hop through localhost costs us next to
>>> nothing in service time, so it's good to have Apache there incase we need
>>> to
>>> yank out Varnish. In the future, we might get rid of Apache and use
>>> HAProxy
>>> (it's load balancing and backend monitoring is much richer than Apache,
>>> and,
>>> it has a beautiful HTTP interface to look at.)
>>> Some variables and our decisions:
>>>             -p obj_workspace=4096 \
>>>     -p sess_workspace=262144 \
>>> Absolutely vital!  Varnish does not allocate enough space by default for
>>> headers, regexps on cookies, and otherwise. It was increased in 2.0.3,
>>> but
>>> really, not increased enough. Without this we were panicing every 20-30
>>> requests and overflowing the sess hash.
>>>             -p listen_depth=8192 \
>>> 8192 is probably excessive for now. If we're queuing 8k conns, something
>>> is
>>> really broke!
>>>             -p log_hashstring=off \
>>> Who cares about this - we don't need it.
>>>     -p lru_interval=60 \
>>> We have many small objects in the search cache. Run LRU more often.
>>>             -p sess_timeout=10 \
>>> If you keep session data around for too long, you waste memory.
>>>     -p shm_workspace=32768 \
>>> Give us a bit more room in shm
>>>             -p ping_interval=1 \
>>> Frequent pings in case the child dies on us.
>>>             -p thread_pools=4 \
>>>             -p thread_pool_min=100 \
>>> This must match up with VARNISH_MIN_THREADS. We use four pools, (pools *
>>> thread_pool_min == VARNISH_MIN_THREADS)
>>>     -p srcaddr_ttl=0 \
>>> Disable the (effectively unused) per source-IP statistics
>>>     -p esi_syntax=1
>>> Disable ESI syntax verification so we can use it to process JSON
>>> requests.
>>> If you have more than 2.1M objects, you should also add:
>>> # -h classic,250007 = recommeded value for 2.1M objects
>>> #     number should be 1/10 expected working set.
>>>
>>> In our VCL, we have a few fancy tricks that we use. We label the cache
>>> server and cache hit/miss rate in vcl_deliver with this code:
>>> Top of VCL:
>>> C{
>>> #include <stdio.h>
>>> #include <unistd.h>
>>> char myhostname[255] = "";
>>>
>>> }C
>>> vcl_deliver:
>>> C{
>>>    VRT_SetHdr(sp, HDR_RESP, "\014X-Cache-Svr:", myhostname,
>>> vrt_magic_string_end);
>>> }C
>>>    /* mark hit/miss on the request */
>>>    if (obj.hits > 0) {
>>>      set resp.http.X-Cache = "HIT";
>>>      set resp.http.X-Cache-Hits = obj.hits;
>>>    } else {
>>>      set resp.http.X-Cache = "MISS";
>>>    }
>>>
>>> vcl_recv:
>>> C{
>>>   if (myhostname[0] == '\0') {
>>>     /* only get hostname once - restart required if hostname changes */
>>>     gethostname(myhostname, 255);
>>>   }
>>> }C
>>>
>>> Portions of /etc/sysconfig/varnish follow...
>>> # The minimum number of worker threads to start
>>> VARNISH_MIN_THREADS=400
>>> # The Maximum number of worker threads to start
>>> VARNISH_MAX_THREADS=1000
>>> # Idle timeout for worker threads
>>> VARNISH_THREAD_TIMEOUT=60
>>> # Cache file location
>>> VARNISH_STORAGE_FILE=/var/lib/varnish/varnish_storage.bin
>>> # Cache file size: in bytes, optionally using k / M / G / T suffix,
>>> # or in percentage of available disk space using the % suffix.
>>> VARNISH_STORAGE_SIZE="8G"
>>> #
>>> # Backend storage specification
>>> VARNISH_STORAGE="malloc,${VARNISH_STORAGE_SIZE}"
>>> # Default TTL used when the backend does not specify one
>>> VARNISH_TTL=5
>>> # the working directory
>>> DAEMON_OPTS="-a ${VARNISH_LISTEN_ADDRESS}:${VARNISH_LISTEN_PORT} \
>>>             -f ${VARNISH_VCL_CONF} \
>>>             -T
>>> ${VARNISH_ADMIN_LISTEN_ADDRESS}:${VARNISH_ADMIN_LISTEN_PORT} \
>>>             -t ${VARNISH_TTL} \
>>>    -n ${VARNISH_WORKDIR} \
>>>             -w
>>> ${VARNISH_MIN_THREADS},${VARNISH_MAX_THREADS},${VARNISH_THREAD_TIMEOUT} \
>>>             -u varnish -g varnish \
>>>             -p obj_workspace=4096 \
>>>    -p sess_workspace=262144 \
>>>             -p listen_depth=8192 \
>>>             -p log_hashstring=off \
>>>    -p lru_interval=60 \
>>>             -p sess_timeout=10 \
>>>    -p shm_workspace=32768 \
>>>             -p ping_interval=1 \
>>>             -p thread_pools=4 \
>>>             -p thread_pool_min=100 \
>>>    -p srcaddr_ttl=0 \
>>>    -p esi_syntax=1 \
>>>             -s ${VARNISH_STORAGE}"
>>>
>>> ---
>>> John Adams
>>> Twitter Operations
>>> jna at twitter.com
>>> http://twitter.com/netik
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> VP of Product Development
>> Instructables.com
>>
>> http://www.instructables.com/member/lebowski
>> _______________________________________________
>> varnish-dev mailing list
>> varnish-dev at projects.linpro.no
>> http://projects.linpro.no/mailman/listinfo/varnish-dev
>
>

-- 
VP of Product Development
Instructables.com

http://www.instructables.com/member/lebowski