<div dir="ltr"><div style>Hello,lists.</div><div style>I an using a varnish for testing on a CentOS6.3 box, some basic info:</div><div style><div> [root@cdn001 ~]# uname -r</div><div> 2.6.32-358.2.1.el6.x86_64</div>
<div> [root@cdn001 ~]# rpm -q varnish</div><div> varnish-3.0.3-3.el6.art.x86_64</div><div><div> [root@cdn001 ~]# free -m</div><div> total used free shared buffers cached</div><div>
Mem: 24023 1054 22968 0 137 240</div><div>-/+ buffers/cache: 676 23347</div><div>Swap: 12079 0 12079</div></div><div style>uptime:</div><div>09:05:24 up 6 days, 11:29, 1 user, load average: 0.00, 0.00, 0.00<br>
</div><div><br></div><div style>and my /etc/sysconfig/varnish looks like below:</div><div style><br></div><div>WORKER_STACK_SIZE=512</div><div>... ...</div><div>VARNISH_STORAGE_SIZE=16G</div><div>VARNISH_STORAGE="malloc,${VARNISH_STORAGE_SIZE}"</div>
<div>VARNISH_TTL=900</div><div>thread_pools=2</div><div>thread_pool_min=500</div><div>thread_pool_max=4000</div><div>thread_pool_timeout=120</div><div>thread_pool_add_delay=2</div><div>thread_pool_fail_delay=100</div><div>
sess_workspace=32768</div><div>session_max=500000</div><div>thread_pool_stack=16384</div><div>connect_timeout=10</div><div>first_byte_timeout=60</div><div>between_bytes_timeout=60</div><div>DAEMON_OPTS="-a ${VARNISH_LISTEN_ADDRESS}:${VARNISH_LISTEN_PORT} \</div>
<div> -f ${VARNISH_VCL_CONF} \</div><div> -T ${VARNISH_ADMIN_LISTEN_ADDRESS}:${VARNISH_ADMIN_LISTEN_PORT} \</div><div> -t ${VARNISH_TTL} \</div><div> -u varnish -g varnish \</div>
<div> -S ${VARNISH_SECRET_FILE} \</div><div> -p thread_pools=${thread_pools} \</div><div> -p thread_pool_min=${thread_pool_min} \</div><div> -p thread_pool_max=${thread_pool_max} \</div>
<div> -p thread_pool_timeout=${thread_pool_timeout} \</div><div> -p thread_pool_add_delay=${thread_pool_add_delay} \</div><div> -p thread_pool_fail_delay=${thread_pool_fail_delay} \</div>
<div> -p sess_workspace=${sess_workspace} \</div><div> -p session_max=${session_max} \</div><div> -p connect_timeout=${connect_timeout} \</div><div> -p first_byte_timeout=${first_byte_timeout} \</div>
<div> -p between_bytes_timeout=${between_bytes_timeout} \</div><div> -s ${VARNISH_STORAGE}"</div><div> </div><div style> as my backend web servers runs in many vms,so I use ' dns director':</div>
<div style><br></div><div>director dnsdomain dns {</div><div> .list = {</div><div> .port = "80";</div><div> "10.0.0.0"/24;</div><div> }</div><div> .ttl = 12h;</div><div>}</div><div style>
<br></div><div style>and the connect_timeout in /etc/sysconfig/varnish has been set to a large number(10 s).</div><div style>everything works good since last night, I got a panic message in system log.</div><div style>By default I have 1000 workers on startup, but when this panic occur,I can just say 500+ workers</div>
<div style>using varnishstat command. bellow is the panic message in syslog:</div><div><br></div></div><div>Apr 10 22:12:58 cdn001 varnishd[25730]: Child (25731) Panic message: Assert error in VRT_IP_string(), cache_vrt.c line 312:#012 Condition((p = WS_Alloc(sp->http->ws, len)) != 0) not true.#012thread = (cache-worker)#012ident = Linux,2.6.32-358.2.1.el6.x86_64,x86_64,-smalloc,-smalloc,-hcritbit,epoll#012Backtrace:#012 0x42ee88: /usr/sbin/varnishd() [0x42ee88]#012 0x436dc5: /usr/sbin/varnishd(VRT_IP_string+0x135) [0x436dc5]#012 0x7f56bc4b922f: ./<a href="http://vcl.xlWKkvTA.so">vcl.xlWKkvTA.so</a>(+0xb622f) [0x7f56bc4b922f]#012 0x436203: /usr/sbin/varnishd(VCL_recv_method+0x43) [0x436203]#012 0x418eaf: /usr/sbin/varnishd(CNT_Session+0xb7f) [0x418eaf]#012 0x430bd1: /usr/sbin/varnishd() [0x430bd1]#012 0x7f56c388f851: /lib64/libpthread.so.0(+0x7851) [0x7f56c388f851]#012 0x7f56c35dd90d: /lib64/libc.so.6(clone+0x6d) [0x7f56c35dd90d]#012sp = 0x7f569b089008 {#012 fd = 320, id = 320, xid = 1298044854,#012 client = 10.0.0.170 33991,#012 step = STP_RECV,#012 handling = deliver,#012 err_code = 404, err_reason = (null),#012 restarts = 0, esi_level = 0#012 flags = #012 bodystatus = 3#012 ws = 0x7f569b089080 { overflow#012 id = "sess",#012 {s,f,r,e} = {0x7f569b089c78,+32768,(nil),+32768},#012 },#012 http[req] = {#012 ws = 0x7f569b089080[sess]#012 "GET",#012 "/images/ico_hots2.gif",#012 "HTTP/1.1",#012 "User-Agent: Opera/9.80 (Android; Opera Mini/6.7.30171/29.3222; U; zh) Presto/2.8.119 Version/11.10",#012 "Host: <a href="http://www.example.com">www.example.com</a>",#012 "Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1",#012 "Accept-Language: zh-cn,en;q=0.9",#012 "Accept-Encoding: gzip, deflate",#012 "Referer: <a href="http://www.example.com/">http://www.example.com/</a>",#012 "Connection: Keep-Alive",#012 "clientip: 117.149.35.78",#012 "X-OperaMini-Features: advanced, file_system, camera, touch, folding, viewport",#012 "Device-Stock-UA: Mozilla/5.0 (Linux; U; Android 2.3.5; zh-cn; BOWAY I5 Build/MocorDroid2.3.5) AppleWebKit/533.1</div>
<div>Apr 10 22:12:58 cdn001 varnishd[25730]: child (7270) Started</div><div>Apr 10 22:12:59 cdn001 kernel: varnishd[7270]: segfault at 0 ip 000000000041067d sp 00007fff4f027a40 error 6 in varnishd[400000+70000]</div><div>
Apr 10 22:12:59 cdn001 varnishd[25730]: Pushing vcls failed:#012CLI communication error (hdr)</div><div>Apr 10 22:12:59 cdn001 varnishd[25730]: Child (7270) died signal=11</div><div>Apr 10 22:12:59 cdn001 varnishd[25730]: Child (-1) said Child starts</div>
<div><br></div><div style>I have restarted varnish daemon and things looks fine now, but I am thinking that how could this error happen? is there something wrong in my varnish configuration ? or something else ?</div><div style>
</div><div style> thanks .</div></div>