Same here. I have encounter this problem after migrating from linux+2.0.1 to solaris+2.0.2.<br><br>With 2.0.2 it happens randomly. Sometimes varnish runs flawlessly for
days, and sometimes locks up couple of times in a short period of time.<br>
<br>
Maybe you should try 2.0.1 on one of your test servers and compare how they behave?<br>
<br>
Best regards,<br>
Bartek<br><br><div class="gmail_quote">2009/3/4 Ross Brown <span dir="ltr"><<a href="mailto:ross@trademe.co.nz">ross@trademe.co.nz</a>></span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi all<br>
<br>
We are hoping to use Varnish for serving image content on our reasonably busy auction site here in New Zealand, but are having an interesting problem during testing.<br>
<br>
We are using latest Varnish (2.0.3) on Ubuntu 8.10 server (64-bit) and have built two servers for testing - both are located in the same datacentre and situated behind an F5 hardware load balancer. We want to keep all images cached in RAM and are using Varnish with jemalloc to achieve this. For the most part, Varnish is working well for us and performance is great.<br>
<br>
However, we have seen both our Varnish servers lock up at precisely the same time and stop processing incoming HTTP requests until Varnishd is manually restarted. This has happened twice and seems to occur at random - the last time was after 5 days of uptime and a significant amount of processed traffic (<1TB).<br>
<br>
When this problem happens, the backend is still reachable and happily serving images. It is not a particularly busy period for us (600 requests/sec/Varnish server - approx 350Mbps outbound each - we got up to nearly 3 times that level without incident previously) but for some reason unknown to us, the servers just suddenly stop processing requests and worker processes increase dramatically.<br>
<br>
After the lockup happened last time, I tried firing up varnishlog and hitting the server directly - my requests were not showing up at all. The *only* entries in the varnish log were related to worker processes being killed over time - no PINGs, PONGs, load balancer healthchecks or anything related to 'normal' varnish activity. It's as if varnishd has completely locked up, but we can't understand what causes both our varnish servers to exhibit this behaviour at exactly the same time, nor why varnish does not detect it and attempt a restart. After a restart, varnish is fine and behaves itself.<br>
<br>
There is nothing to indicate an error with the backend, nor anything in syslog to indicate a Varnish problem. Pointers of any kind would be appreciated :)<br>
<br>
Best regards<br>
<br>
Ross Brown<br>
Trade Me<br>
<a href="http://www.trademe.co.nz" target="_blank">www.trademe.co.nz</a><br>
<br>
*** Startup Options (as per hints in wiki for caching millions of objects):<br>
-a <a href="http://0.0.0.0:80" target="_blank">0.0.0.0:80</a> -f /usr/local/etc/default.net.vcl -T <a href="http://0.0.0.0:8021" target="_blank">0.0.0.0:8021</a> -t 86400 -h classic,1200007 -p thread_pool_max=4000 -p thread_pools=4 -p listen_depth=4096 -p lru_interval=3600 -p obj_workspace=4096 -s malloc,10G<br>
<br>
*** Running VCL:<br>
backend default {<br>
.host = "10.10.10.10";<br>
.port = "80";<br>
}<br>
<br>
sub vcl_recv {<br>
# Don't cache objects requested with query string in URI.<br>
# Needed for newsletter headers (openrate) and health checks.<br>
if (req.url ~ "\?.*") {<br>
pass;<br>
}<br>
<br>
# Force lookup if the request is a no-cache request from the client.<br>
if (req.http.Cache-Control ~ "no-cache") {<br>
unset req.http.Cache-Control;<br>
lookup;<br>
}<br>
<br>
# By default, Varnish will not serve requests that come with a cookie from its cache.<br>
unset req.http.cookie;<br>
unset req.http.authenticate;<br>
<br>
# No action here, continue into default vcl_recv{}<br>
}<br>
<br>
<br>
***Stats<br>
458887 Client connections accepted<br>
170714631 Client requests received<br>
133012763 Cache hits<br>
3715 Cache hits for pass<br>
27646213 Cache misses<br>
37700868 Backend connections success<br>
0 Backend connections not attempted<br>
0 Backend connections too many<br>
40 Backend connections failures<br>
37512808 Backend connections reuses<br>
37514682 Backend connections recycles<br>
0 Backend connections unused<br>
1339 N struct srcaddr<br>
16 N active struct srcaddr<br>
756 N struct sess_mem<br>
12 N struct sess<br>
761152 N struct object<br>
761243 N struct objecthead<br>
0 N struct smf<br>
0 N small free smf<br>
0 N large free smf<br>
322 N struct vbe_conn<br>
345 N struct bereq<br>
20 N worker threads<br>
2331 N worker threads created<br>
0 N worker threads not created<br>
0 N worker threads limited<br>
0 N queued work requests<br>
35249 N overflowed work requests<br>
0 N dropped work requests<br>
1 N backends<br>
44 N expired objects<br>
26886639 N LRU nuked objects<br>
0 N LRU saved objects<br>
15847787 N LRU moved objects<br>
0 N objects on deathrow<br>
3 HTTP header overflows<br>
0 Objects sent with sendfile<br>
164595318 Objects sent with write<br>
0 Objects overflowing workspace<br>
458886 Total Sessions<br>
170715215 Total Requests<br>
306 Total pipe<br>
10054413 Total pass<br>
37700586 Total fetch<br>
49458782160 Total header bytes<br>
1151144727614 Total body bytes<br>
89464 Session Closed<br>
0 Session Pipeline<br>
0 Session Read Ahead<br>
0 Session Linger<br>
170622902 Session herd<br>
7875546129 SHM records<br>
380705819 SHM writes<br>
138 SHM flushes due to overflow<br>
763205 SHM MTX contention<br>
2889 SHM cycles through buffer<br>
0 allocator requests<br>
0 outstanding allocations<br>
0 bytes allocated<br>
0 bytes free<br>
101839895 SMA allocator requests<br>
1519005 SMA outstanding allocations<br>
10736616112 SMA outstanding bytes<br>
562900737623 SMA bytes allocated<br>
552164121511 SMA bytes free<br>
56 SMS allocator requests<br>
0 SMS outstanding allocations<br>
0 SMS outstanding bytes<br>
25712 SMS bytes allocated<br>
25712 SMS bytes freed<br>
37700490 Backend requests made<br>
3 N vcl total<br>
3 N vcl available<br>
0 N vcl discarded<br>
1 N total active purges<br>
1 N new purges added<br>
0 N old purges deleted<br>
0 N objects tested<br>
0 N regexps tested against<br>
0 N duplicate purges removed<br>
0 HCB Lookups without lock<br>
0 HCB Lookups with lock<br>
0 HCB Inserts<br>
0 Objects ESI parsed (unlock)<br>
0 ESI parse errors (unlock)<br>
<br>
<br>
<br>
_______________________________________________<br>
varnish-misc mailing list<br>
<a href="mailto:varnish-misc@projects.linpro.no">varnish-misc@projects.linpro.no</a><br>
<a href="http://projects.linpro.no/mailman/listinfo/varnish-misc" target="_blank">http://projects.linpro.no/mailman/listinfo/varnish-misc</a><br>
</blockquote></div><br>