keeping varnishstat open will bring down server
Angelo Höngens
A.Hongens at netmatch.nl
Tue Apr 13 15:09:29 CEST 2010
Hey guys,
I've seen something I'd like to share with you, perhaps it could be seen as a bug in varnishstat.
Yesterday I opened ssh sessions to my 4 balancers, to run some scripts, and then I opened varnishstat to monitor them. A while later I had to leave in a rush and closed my laptop's lid, and in that process killed my vpn tunnel and ssh sessions. However, the varnishstat process (apparently) keeps running. (FreeBSD 7.2 x64)
Just a few hours ago (so around 16 hours later), I had one balancer die on my (become completely unresponsive, refuse connections to port 80). I immediately restarted varnishd, and I also saw a varnishstat instance eat 100% cpu, which I killed.
Now when I just looked on the other balancers, I see the varnishstat instance using up a lot of CPU (only one out of 4 cores though):
last pid: 77863; load averages: 1.40, 1.48, 1.47 up 105+00:24:26 14:56:40
166 processes: 2 running, 164 sleeping
CPU: 27.1% user, 0.0% nice, 4.2% system, 1.9% interrupt, 66.8% idle
Mem: 6430M Active, 550M Inact, 709M Wired, 189M Cache, 399M Buf, 32M Free
Swap: 4096M Total, 228M Used, 3868M Free, 5% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
69587 root 1 112 0 95640K 1044K CPU3 3 19.1H 77.20% varnishstat
76211 haproxy 1 4 0 48928K 18944K kqread 1 16:34 3.17% haproxy
68762 www 116 44 0 8756M 6412M select 0 0:01 0.39% varnishd
31203 root 1 44 0 176M 5476K select 2 439:16 0.00% snmpd
69527 root 1 8 0 94312K 83384K nanslp 0 11:59 0.00% varnishncsa
37934 root 1 4 0 66244K 3164K kqread 0 8:46 0.00% squid
1912 root 1 44 0 10484K 724K select 0 7:50 0.00% ntpd
2036 root 1 44 0 85732K 3528K select 1 4:12 0.00% httpd
56664 root 1 44 0 5692K 616K select 2 0:51 0.00% syslogd
2056 root 1 8 0 6748K 392K nanslp 2 0:33 0.00% cron
2023 root 1 4 0 5808K 428K kqread 0 0:23 0.00% master
2031 postfix 1 4 0 5808K 408K kqread 0 0:22 0.00% qmgr
76181 www 1 4 0 85732K 3732K kqread 3 0:01 0.00% httpd
76182 www 1 20 0 85732K 3716K lockf 3 0:01 0.00% httpd
76185 www 1 20 0 85732K 3696K lockf 2 0:01 0.00% httpd
76298 www 1 20 0 85732K 3868K lockf 3 0:01 0.00% httpd
So it seems running varnishstat for a long time, it will use more and more resources, and in my case, even cause varnishd to fail somehow (it could be a coincidence, but I don't think so).
After killing varnishstat, load went back from 1.5 to 0.2, around the usual.
--
With kind regards,
Angelo Höngens
Systems Administrator
------------------------------------------
NetMatch
tourism internet software solutions
Ringbaan Oost 2b
5013 CA Tilburg
T: +31 (0)13 5811088
F: +31 (0)13 5821239
mailto:A.Hongens at netmatch.nl
http://www.netmatch.nl
------------------------------------------
More information about the varnish-misc
mailing list