Hello!<br><br>I'm using Varnish 2.0.5 running on the following server's specification:<br><ul><li> 2 Quadcore Intel Xeon 2.00Ghz 64bits</li><li> OS: RHEL 5 (64 bits)</li><li> 8MB RAM</li><li> 1GB Ethernet</li></ul>
<br>I've configured my network infraestructure with a Load Balancer, a Varnish dedicated server and five web servers plus database servers. We have the following network configuration:<br><span style="font-family: courier new,monospace;">external client ---> Load Balancer (public VIP) ---> Varnish Proxy --> Load Balancer (private VIP) --> Web Servers</span><br>
<br>In this configuration, the Load Balancer have the responsability for send the request to the respective server according to the domain. The Varnish server have configurated the Load Balancer's private VIP as unique backend.<br>
<br>Now, let me explain the issue. On a low traffic scenario, the websites are served correctly, but sometimes the page get blank or partially loaded. In both cases a 200 OK response code is received and also the response body, however it is received incomplete. Then I proceed to check the varnishstat and varnishlog output, and I have some observations: The varnish frecuently restarted and at execute <i>varnishlog -i Debug -I</i> I got the following output:<br>
<span style="font-family: courier new,monospace;">400 Debug c "Write error, len = 34500/55022, errno = Success"</span><br><br>I don't know what it means exactly, but some google seach give me a clue: maybe be caused by an interruption during client communication. So, this error could show the cause of the problem. Although I don't know why the cause of this error, I guess a network buffer overflow, so I show you some OS related values:<br>
<br><span style="font-family: courier new,monospace;">/proc/sys/net/ipv4/ip_local_port_range = 32768 61000</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">/proc/sys/net/core/rmem_max = 131071</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">/proc/sys/net/core/wmem_max = 131071</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">/proc/sys/net/ipv4/tcp_mem = 196608 262144 393216</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">/proc/sys/net/ipv4/tcp_wmem = 4096 16384 4194304</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">/proc/sys/net/ipv4/tcp_fin_timeout = 60</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">/proc/sys/net/core/netdev_max_backlog = 1000</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">/proc/sys/net/core/somaxconn = 128</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">/proc/sys/net/ipv4/tcp_syncookies = 1</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">/proc/sys/net/ipv4/tcp_max_orphans = 65536</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">/proc/sys/net/ipv4/tcp_max_syn_backlog = 1024</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">/proc/sys/net/ipv4/tcp_synack_retries = 5</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">/proc/sys/net/ipv4/tcp_syn_retries = 5</span><br><br>This same values can be found in this varnish performance article: <a href="http://varnish-cache.org/wiki/Performance">http://varnish-cache.org/wiki/Performance</a>. The mine ones seems very low and maybe it is one of the causes. With the average traffic (around 500 concurrent users for all sites), the Varnish service not respond and the server load raise up to 612. Respect to the web site response, a Connection refused error (Code 503) is returned. In this ocassion I didn't can review the varnish statistics.<br>
<br>Here are my varnish params, maybe it can help:<br><span style="font-family: courier new,monospace;">200 2224</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">accept_fd_holdoff 50 [ms]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">acceptor default (epoll, poll)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">auto_restart on [bool]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">backend_http11 on [bool]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">between_bytes_timeout 60.000000 [s]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">cache_vbe_conns off [bool]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">cc_command "exec cc -fpic -shared -Wl,-x -o %o %s"</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">cli_buffer 8192 [bytes]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">cli_timeout 5 [seconds]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">client_http11 off [bool]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">clock_skew 10 [s]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">connect_timeout 0.400000 [s]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">default_grace 10</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">default_ttl 180 [seconds]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">diag_bitmap 0x0 [bitmap]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">err_ttl 0 [seconds]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">esi_syntax 0 [bitmap]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">fetch_chunksize 128 [kilobytes]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">first_byte_timeout 60.000000 [s]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">group varnish (103)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">listen_address :80</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">listen_depth 1024 [connections]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">log_hashstring off [bool]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">log_local_address off [bool]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">lru_interval 360 [seconds]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">max_esi_includes 5 [includes]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">max_restarts 4 [restarts]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">obj_workspace 8192 [bytes]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">overflow_max 100 [%]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">ping_interval 3 [seconds]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">pipe_timeout 60 [seconds]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">prefer_ipv6 off [bool]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">purge_dups on [bool]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">purge_hash on [bool]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">rush_exponent 3 [requests per request]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">send_timeout 600 [seconds]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">sess_timeout 5 [seconds]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">sess_workspace 65536 [bytes]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">session_linger 100 [ms]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">session_max 100000 [sessions]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">shm_reclen 255 [bytes]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">shm_workspace 8192 [bytes]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">srcaddr_hash 1049 [buckets]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">srcaddr_ttl 0 [seconds]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">thread_pool_add_delay 2 [milliseconds]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">thread_pool_add_threshold 2 [requests]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">thread_pool_fail_delay 200 [milliseconds]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">thread_pool_max 5000 [threads]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">thread_pool_min 150 [threads]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">thread_pool_purge_delay 1000 [milliseconds]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">thread_pool_stack unlimited [bytes]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">thread_pool_timeout 120 [seconds]</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">thread_pools 8 [pools]</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">user varnish (100)</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">vcl_trace off [bool]</span><br><br>What are your suggestions?<br>Is this a Varnish or Operating System configuration problem?<br><br><br><br><br>