[Varnish] #275: Varnish child stops responding

Varnish varnish-bugs at projects.linpro.no
Wed Sep 24 05:14:01 CEST 2008


#275: Varnish child  stops responding
---------------------------------+------------------------------------------
 Reporter:  chenxy               |        Owner:  phk                      
     Type:  defect               |       Status:  new                      
 Priority:  high                 |    Milestone:  Varnish 2.0 code complete
Component:  varnishd             |      Version:  trunk                    
 Severity:  normal               |   Resolution:                           
 Keywords:  Varnish child stuck  |  
---------------------------------+------------------------------------------
Comment (by chenxy):

 I try do a test again . This is a report

 Environment: FreeBSD 7.0-RELEASE amd64 ,SCHED_ULE ,filesystem Turn off
 soft-updates and noatime

 Varnish version : trunk revision: 3213

 Test tool: http_load  ,version 12mar2006

 urls count : 155573

 http_load command:
 {{{

 [root at localhost http_load-12mar2006]# ./http_load -proxy
 varnish_host_ip:80 -parallel 500 -seconds 600 siege_url.txt

 ......
 http://image2.XXX.com.cn/igame/new/new015/shop_index_67.gif: byte count
 wrong
 http://i2.XXXimg.cn/gm/http/blog.XXX.com.cn/s/U2217P115T146D16346F4314DT20080918102123_w80.jpg:
 byte count wrong
 http://image2.XXX.com.cn/music/artist/0/1200_200150.jpg: byte count wrong
 http://image2.XXX.com.cn/music/artist/0/1200_200150.jpg: byte count wrong
 http://i3.XXXimg.cn/qc/buy/U121P33T133D11263F1620DT20080917103216.jpg:
 byte count wrong
 http://i2.XXXimg.cn/IT/mobile/models/U1069P2T136D2764F3244DT20050526165505_150h.jpg:
 byte count wrong
 http://i2.XXXimg.cn/IT/cr/2008/0229/1730605940.jpg: byte count wrong
 http://i0.XXXimg.cn/lx/fa/p/2008/0805/U1562P8T1D750071F915DT20080805114645.jpg:
 byte count wrong
 http://image2.XXX.com.cn/download/ebook/2004-11-22/U74P176T13D5119F197DT20041122113641.gif:
 byte count wrong
 ......

 290320 fetches, 500 max parallel, 4.36283e+09 bytes, in 600 seconds
 15027.6 mean bytes/connection
 483.866 fetches/sec, 7.27137e+06 bytes/sec
 msecs/connect: 46.4545 mean, 6003.24 max, 0.219 min
 msecs/first-response: 788.459 mean, 59985.5 max, 0.349 min
 574 timeouts
 2398 bad byte counts
 HTTP response codes:
   code 200 -- 285218
   code 301 -- 7
   code 400 -- 2
   code 403 -- 939
   code 404 -- 1968
   code 503 -- 112

 }}}

 log info:

 {{{

 Sep 23 06:28:38 HP08020020 varnishd[25386]: child (25387) Started
 Sep 23 06:28:38 HP08020020 varnishd[25386]: Child (25387) said Closed fds:
 7 8 11 12 14 15
 Sep 23 06:28:38 HP08020020 varnishd[25386]: Child (25387) said Child
 starts
 Sep 23 06:28:38 HP08020020 varnishd[25386]: Child (25387) said managed to
 mmap 32212254720 bytes of 32212254720
 Sep 23 06:28:38 HP08020020 last message repeated 3 times
 Sep 23 06:28:38 HP08020020 varnishd[25386]: Child (25387) said Ready
 Sep 23 08:42:34 HP08020020 varnishd[25386]: Child (25387) not responding
 to ping, killing it.
 Sep 23 08:42:34 HP08020020 kernel: pid 25387 (varnishd), uid 65534: exited
 on signal 3
 Sep 23 08:42:34 HP08020020 varnishd[25386]: child (31998) Started
 Sep 23 08:42:35 HP08020020 varnishd[25386]: Child (31998) said Closed fds:
 7 8 11 12 14 15
 Sep 23 08:42:35 HP08020020 varnishd[25386]: Child (31998) said Child
 starts
 Sep 23 08:42:35 HP08020020 varnishd[25386]: Child (31998) said managed to
 mmap 32212254720 bytes of 32212254720
 Sep 23 08:42:35 HP08020020 last message repeated 3 times
 Sep 23 08:42:35 HP08020020 varnishd[25386]: Child (31998) said Ready
 Sep 23 08:43:19 HP08020020 varnishd[25386]: Child (31998) not responding
 to ping, killing it.
 Sep 23 08:43:49 HP08020020 last message repeated 3 times
 Sep 23 08:43:52 HP08020020 kernel: pid 31998 (varnishd), uid 65534: exited
 on signal 3
 Sep 23 08:43:52 HP08020020 varnishd[25386]: child (32040) Started
 Sep 23 08:43:53 HP08020020 varnishd[25386]: Child (32040) said Closed fds:
 7 8 11 12 14 15
 Sep 23 08:43:53 HP08020020 varnishd[25386]: Child (32040) said Child
 starts
 Sep 23 08:43:53 HP08020020 varnishd[25386]: Child (32040) said managed to
 mmap 32212254720 bytes of 32212254720
 Sep 23 08:43:53 HP08020020 last message repeated 3 times
 Sep 23 08:43:53 HP08020020 varnishd[25386]: Child (32040) said Ready
 Sep 23 08:44:21 HP08020020 cfexecd[667]:  Cannot mail to
 root at sys.XXXX.com.cn.
 Sep 23 08:44:37 HP08020020 varnishd[25386]: Child (32040) not responding
 to ping, killing it.
 Sep 23 08:45:17 HP08020020 last message repeated 4 times
 Sep 23 08:45:19 HP08020020 kernel: pid 32040 (varnishd), uid 65534: exited
 on signal 3
 Sep 23 08:45:19 HP08020020 varnishd[25386]: child (32100) Started
 Sep 23 08:45:19 HP08020020 varnishd[25386]: Child (32100) said Closed fds:
 7 8 11 12 14 15
 Sep 23 08:45:19 HP08020020 varnishd[25386]: Child (32100) said Child
 starts
 Sep 23 08:45:19 HP08020020 varnishd[25386]: Child (32100) said managed to
 mmap 32212254720 bytes of 32212254720
 Sep 23 08:45:19 HP08020020 last message repeated 3 times
 Sep 23 08:45:19 HP08020020 varnishd[25386]: Child (32100) said Ready
 Sep 23 08:47:04 HP08020020 varnishd[25386]: Child (32100) not responding
 to ping, killing it.
 Sep 23 08:47:54 HP08020020 last message repeated 5 times
 Sep 23 08:47:58 HP08020020 kernel: pid 32100 (varnishd), uid 65534: exited
 on signal 3
 Sep 23 08:47:58 HP08020020 varnishd[25386]: child (32210) Started
 Sep 23 08:47:58 HP08020020 varnishd[25386]: Child (32210) said Closed fds:
 7 8 11 12 14 15
 Sep 23 08:47:58 HP08020020 varnishd[25386]: Child (32210) said Child
 starts
 Sep 23 08:47:58 HP08020020 varnishd[25386]: Child (32210) said managed to
 mmap 32212254720 bytes of 32212254720
 Sep 23 08:47:58 HP08020020 last message repeated 3 times
 Sep 23 08:47:58 HP08020020 varnishd[25386]: Child (32210) said Ready

 }}}


 system run status:
 {{{

         system block,not response for a while and child killed and
 restarted again, threads reached about 550. After child restarted again,
 varnishstat show a little 'Client connections accepted'

         top command status:

           PID USERNAME    THR PRI NICE   SIZE    RES STATE  C   TIME
 WCPU COMMAND
         32100 nobody      555  44    0   120G  1425M STOP   2   0:27
 0.00% varnishd

         gstat command status:

             0    129     78   3325    8.8     51    815    6.5   42.7|
 ufs/data0
             0    143     77   2941    8.8     66   1055   10.1   49.7|
 ufs/data1
             5    112     57   2110    8.7     55    879    7.9   37.7|
 ufs/data2
             4    141     78   2861   11.0     63   1007    8.5   46.6|
 ufs/data3

 }}}


 varnish run command:
 {{{
                 varnishd -a 0.0.0.0:80 -f /usr/local/etc/vcl.conf \
                 -u nobody -g nobody \
                 -T 127.0.0.1:8080 \
                 -w 10,1000,600 \
                 -s file,/data0/varnish/v,30G \
                 -s file,/data1/varnish/v,30G \
                 -s file,/data2/varnish/v,30G \
                 -s file,/data3/varnish/v,30G \
                 -h classic,500009 \
                 -p thread_pools=4 \
                 -p listen_depth=4096 \
                 -p lru_interval=3600 \
                 -p ping_interval=10
 }}}

-- 
Ticket URL: <http://varnish.projects.linpro.no/ticket/275#comment:11>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator


More information about the varnish-bugs mailing list