[Varnish] #1375: Varnish performance appears to be impacted by the presence of many vary headers

Varnish varnish-bugs at varnish-cache.org
Tue Nov 19 15:02:34 CET 2013


#1375: Varnish performance appears to be impacted by the presence of many vary
headers
----------------------+----------------------
 Reporter:  closer01  |       Type:  defect
   Status:  new       |   Priority:  normal
Milestone:            |  Component:  varnishd
  Version:  3.0.4     |   Severity:  normal
 Keywords:            |
----------------------+----------------------
 Varnish performance appears to be significantly impacted by the presence
 of many vary headers

 == Background ==

 We use Varnish as our primary caching layer on a large platform.
 We've seen unpredictable behaviour in Varnish in front of web applications
 characterised by
 long response times. Having investigated the response times of our
 backends and having run
 a number of loadtests we believe we have isolated the problem to Varnish's
 caching behaviour
 around vary headers. Our own applications vary on a number of headers each
 with a number of variations.
 We believe that Varnish demonstrates a significant performance decrease
 when responses
 vary on a high number of headers and a substantial performance decrease
 when varying on a moderate number of headers with many variations.

 == Investigation ==
 We've been able to reproduce this problem outside of our platform
 environment
 on both Varnish 2 and Varnish 3 when running vanilla VCL.

 The scenarios detailed below show:
 - A page that varies on a high number of headers but only one variation
 per header
 - A page that varies on one header but with a high number of variations
 - A page that varies on a number of headers which have a number of
 variations.

 We ran our scenarios against both Varnish 2.0.1 and Varnish 3.0.4 for
 comparison.
 We've included data against Varnish 2.0.1 where we feel it's of interest
 but
 are raising this as an issue in Varnish 3.0.4.


 === Testing Environment ===
 We used AWS as our testing environment, further information on our
 instance
 sizes are documented here: http://aws.amazon.com/ec2/instance-types
 /instance-details/

 ==== Varnish setup ====
 - 1 64-bit, 'General Purpose' m1.medium instance
 - RHEL 5.10
 - A simple demo node JS application as a backend running locally

 ==== Loadtest setup ====

 - 1 64-bit, 'General Purpose' m1.medium instance,
 - Load generated by JMeter against a single endpoint with immediate ramp
 up
 - Test duration: 3 minutes
 - "Concurrent users": 800

 Both our Varnish VM and loadtesting VM ran in the same availability zone
 and should not be subject to network high network latency.


 === Scenarios ===

 ==== Many Headers, 1 Variation per Header (Ref: Horizontal) ====
 - Page varies on 400 headers
 - Each header has only one value
 - Each request supplies one randomly chosen header out of the 400

 ==== Many Variations, 1 Header (Ref: Vertical) ====
 - Page varies on 1 header
 - This 1 header has 400 variations
 - Each request supplies a random value between 1-400 for this single
 header

 ==== Many Variations, Many Headers (Ref: Diagonal) ====
 - Page varies on 20 headers
 - Each of those headers varies on 20 values
 - Each request supplies a random value between 1-20 for a randomly chosen
 header
  from the 20.

 === Results ===
 Attached are graphs showing response times over time for each Varnish
 3.0.4 scenario.

 ==== Response Times ====
 v2 - Varnish 2.0.1,
 v3 - Varnish 3.0.4

 || Varnish || Scenario || Min (ms) || Mean (ms) || Max (ms) || Standard
 Deviation (%) || Successful Requests (%) || Throughput (req/sec) ||
 || v2 || Horizontal || 2 || 3948 || 19763 || 3635.8 || 56.68 || 174.2 ||
 || v3 || Horizontal || 3 || 14546 || 139447 || 21897.94 || 95.49 || 28.7
 ||
 || || || || || || || || ||
 || v2 || Diagonal || 1 || 385 || 1877 || 220.17 || 100 || 219.0 ||
 || v3 || Diagonal || 1 || 374 || 2520 || 214.65 || 100 || 221.0 ||
 || || || || || || || || ||
 || v2 || Vertical || 1 || 288 || 1347 || 170.06 || 100 || 297.6 ||
 || v3 || Vertical || 1 || 282 || 1409 || 168.48 || 100 || 298.2 ||

 ==== Varnish Stats ====
 During 'horizontal' testing we observed both versions of Varnish seem to
 frequently restart themselves due to segfaults. Data recorded from
 Varnishstat is therefore incomplete for those situations.

 || Varnish || Scenario || Hits || Misses || Percentage Hits (%) ||
 || v2 || Diagonal || 36772 || 6565 || 84.9 ||
 || v3 || Diagonal || 37094 || 6583 || 84.9 ||
 || || || || || || || || ||
 || v2 || Vertical || 51808 || 6824 || 88.4 ||
 || v3 || Vertical || 52114 || 6823 || 88.4 ||

 === Conclusions ===
 Our 'horizontal' tests exhibit behaviour in Varnish that result in very
 long response times.
 Our 'diagonal' tests appear to exhibit a 25% lower throughput, a higher
 average response time and higher peaks in response time.
 Whilst the 'horizontal' scenario doesn't correspond to a realistic
 application, we believe this demonstrates the extremity of
 a problem within Varnish. We believe we're experiencing a more extreme
 variant of the 'diagonal' behaviour on our own platform and were able to
 reproduce this in our initial load tests against that platform. Hence, we
 have reason to believe that Varnish's implementation of caching variations
 is the root cause.

 To add context, we noticed and became interested in this behaviour as some
 of our applications vary on up to 10 headers and were failing to respond
 under moderate load. We were noticing our backends responding quickly but
 when looking through our Varnish logs requests appeared to take a very
 long time within Varnish itself.

 ==== Other Observations ====
 In the scenarios above, an equal number of variations should be present in
 each scenario and we'd expect to see reasonably consistent behaviour
 across those three scenarios. For the available 'diagonal' scenario, our
 high hit ratio suggests our requests to the backend should have been low
 and given our application was running locally are unable to attribute the
 lower throughput to network latency.
 With respect to the 'horizontal' scenario, the presence of seg faulting
 may also suggest Varnish struggles to cope with pages that vary on many
 headers.


 == A Solution ==
 We're not able to present any solution or patch to improve this behaviour.
 Although we have observed changes made in this area of the code
 previously:
 https://github.com/varnish/Varnish-
 Cache/commit/7bc0068d8f422c917042e35867e00a19f8956f46

 === Attachments ===
 Graphs for Varnish 3.0.4 test results:
 - Horizontal.jpg
 - Vertical.jpg
 - Diagonal.jpg

-- 
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1375>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator




More information about the varnish-bugs mailing list