keeping varnishstat open will bring down server

Angelo Höngens A.Hongens at netmatch.nl
Tue Apr 13 15:09:29 CEST 2010


Hey guys,

I've seen something I'd like to share with you, perhaps it could be seen as a bug in varnishstat.

Yesterday I opened ssh sessions to my 4 balancers, to run some scripts, and then I opened varnishstat to monitor them. A while later I had to leave in a rush and closed my laptop's lid, and in that process killed my vpn tunnel and ssh sessions. However, the varnishstat process (apparently) keeps running. (FreeBSD 7.2 x64)

Just a few hours ago (so around 16 hours later), I had one balancer die on my (become completely unresponsive, refuse connections to port 80). I immediately restarted varnishd, and I also saw a varnishstat instance eat 100% cpu, which I killed.

Now when I just looked on the other balancers, I see the varnishstat instance using up a lot of CPU (only one out of 4 cores though):


last pid: 77863;  load averages:  1.40,  1.48,  1.47     up 105+00:24:26 14:56:40
166 processes: 2 running, 164 sleeping
CPU: 27.1% user,  0.0% nice,  4.2% system,  1.9% interrupt, 66.8% idle
Mem: 6430M Active, 550M Inact, 709M Wired, 189M Cache, 399M Buf, 32M Free
Swap: 4096M Total, 228M Used, 3868M Free, 5% Inuse

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
69587 root        1 112    0 95640K  1044K CPU3   3  19.1H 77.20% varnishstat
76211 haproxy     1   4    0 48928K 18944K kqread 1  16:34  3.17% haproxy
68762 www       116  44    0  8756M  6412M select 0   0:01  0.39% varnishd
31203 root        1  44    0   176M  5476K select 2 439:16  0.00% snmpd
69527 root        1   8    0 94312K 83384K nanslp 0  11:59  0.00% varnishncsa
37934 root        1   4    0 66244K  3164K kqread 0   8:46  0.00% squid
 1912 root        1  44    0 10484K   724K select 0   7:50  0.00% ntpd
 2036 root        1  44    0 85732K  3528K select 1   4:12  0.00% httpd
56664 root        1  44    0  5692K   616K select 2   0:51  0.00% syslogd
 2056 root        1   8    0  6748K   392K nanslp 2   0:33  0.00% cron
 2023 root        1   4    0  5808K   428K kqread 0   0:23  0.00% master
 2031 postfix     1   4    0  5808K   408K kqread 0   0:22  0.00% qmgr
76181 www         1   4    0 85732K  3732K kqread 3   0:01  0.00% httpd
76182 www         1  20    0 85732K  3716K lockf  3   0:01  0.00% httpd
76185 www         1  20    0 85732K  3696K lockf  2   0:01  0.00% httpd
76298 www         1  20    0 85732K  3868K lockf  3   0:01  0.00% httpd


So it seems running varnishstat for a long time, it will use more and more resources, and in my case, even cause varnishd to fail somehow (it could be a coincidence, but I don't think so).

After killing varnishstat, load went back from 1.5 to 0.2, around the usual.

-- 

 
With kind regards,
 
 
Angelo Höngens
 
Systems Administrator
 
------------------------------------------
NetMatch
tourism internet software solutions
 
Ringbaan Oost 2b
5013 CA Tilburg
T: +31 (0)13 5811088
F: +31 (0)13 5821239
 
mailto:A.Hongens at netmatch.nl
http://www.netmatch.nl
------------------------------------------





More information about the varnish-misc mailing list