Varnish hangs / requests time out
Ross Brown
ross at trademe.co.nz
Wed Mar 4 06:16:10 CET 2009
Hi all
We are hoping to use Varnish for serving image content on our reasonably busy auction site here in New Zealand, but are having an interesting problem during testing.
We are using latest Varnish (2.0.3) on Ubuntu 8.10 server (64-bit) and have built two servers for testing - both are located in the same datacentre and situated behind an F5 hardware load balancer. We want to keep all images cached in RAM and are using Varnish with jemalloc to achieve this. For the most part, Varnish is working well for us and performance is great.
However, we have seen both our Varnish servers lock up at precisely the same time and stop processing incoming HTTP requests until Varnishd is manually restarted. This has happened twice and seems to occur at random - the last time was after 5 days of uptime and a significant amount of processed traffic (<1TB).
When this problem happens, the backend is still reachable and happily serving images. It is not a particularly busy period for us (600 requests/sec/Varnish server - approx 350Mbps outbound each - we got up to nearly 3 times that level without incident previously) but for some reason unknown to us, the servers just suddenly stop processing requests and worker processes increase dramatically.
After the lockup happened last time, I tried firing up varnishlog and hitting the server directly - my requests were not showing up at all. The *only* entries in the varnish log were related to worker processes being killed over time - no PINGs, PONGs, load balancer healthchecks or anything related to 'normal' varnish activity. It's as if varnishd has completely locked up, but we can't understand what causes both our varnish servers to exhibit this behaviour at exactly the same time, nor why varnish does not detect it and attempt a restart. After a restart, varnish is fine and behaves itself.
There is nothing to indicate an error with the backend, nor anything in syslog to indicate a Varnish problem. Pointers of any kind would be appreciated :)
Best regards
Ross Brown
Trade Me
www.trademe.co.nz
*** Startup Options (as per hints in wiki for caching millions of objects):
-a 0.0.0.0:80 -f /usr/local/etc/default.net.vcl -T 0.0.0.0:8021 -t 86400 -h classic,1200007 -p thread_pool_max=4000 -p thread_pools=4 -p listen_depth=4096 -p lru_interval=3600 -p obj_workspace=4096 -s malloc,10G
*** Running VCL:
backend default {
.host = "10.10.10.10";
.port = "80";
}
sub vcl_recv {
# Don't cache objects requested with query string in URI.
# Needed for newsletter headers (openrate) and health checks.
if (req.url ~ "\?.*") {
pass;
}
# Force lookup if the request is a no-cache request from the client.
if (req.http.Cache-Control ~ "no-cache") {
unset req.http.Cache-Control;
lookup;
}
# By default, Varnish will not serve requests that come with a cookie from its cache.
unset req.http.cookie;
unset req.http.authenticate;
# No action here, continue into default vcl_recv{}
}
***Stats
458887 Client connections accepted
170714631 Client requests received
133012763 Cache hits
3715 Cache hits for pass
27646213 Cache misses
37700868 Backend connections success
0 Backend connections not attempted
0 Backend connections too many
40 Backend connections failures
37512808 Backend connections reuses
37514682 Backend connections recycles
0 Backend connections unused
1339 N struct srcaddr
16 N active struct srcaddr
756 N struct sess_mem
12 N struct sess
761152 N struct object
761243 N struct objecthead
0 N struct smf
0 N small free smf
0 N large free smf
322 N struct vbe_conn
345 N struct bereq
20 N worker threads
2331 N worker threads created
0 N worker threads not created
0 N worker threads limited
0 N queued work requests
35249 N overflowed work requests
0 N dropped work requests
1 N backends
44 N expired objects
26886639 N LRU nuked objects
0 N LRU saved objects
15847787 N LRU moved objects
0 N objects on deathrow
3 HTTP header overflows
0 Objects sent with sendfile
164595318 Objects sent with write
0 Objects overflowing workspace
458886 Total Sessions
170715215 Total Requests
306 Total pipe
10054413 Total pass
37700586 Total fetch
49458782160 Total header bytes
1151144727614 Total body bytes
89464 Session Closed
0 Session Pipeline
0 Session Read Ahead
0 Session Linger
170622902 Session herd
7875546129 SHM records
380705819 SHM writes
138 SHM flushes due to overflow
763205 SHM MTX contention
2889 SHM cycles through buffer
0 allocator requests
0 outstanding allocations
0 bytes allocated
0 bytes free
101839895 SMA allocator requests
1519005 SMA outstanding allocations
10736616112 SMA outstanding bytes
562900737623 SMA bytes allocated
552164121511 SMA bytes free
56 SMS allocator requests
0 SMS outstanding allocations
0 SMS outstanding bytes
25712 SMS bytes allocated
25712 SMS bytes freed
37700490 Backend requests made
3 N vcl total
3 N vcl available
0 N vcl discarded
1 N total active purges
1 N new purges added
0 N old purges deleted
0 N objects tested
0 N regexps tested against
0 N duplicate purges removed
0 HCB Lookups without lock
0 HCB Lookups with lock
0 HCB Inserts
0 Objects ESI parsed (unlock)
0 ESI parse errors (unlock)
More information about the varnish-misc
mailing list