Chield dies
Eivind Bengtsson
ebe at dmi.dk
Wed May 12 12:25:40 CEST 2010
Hi guys I've just had an issue on one of our 4 varnishservers - that I
cannot explain.. The varnishd-child just dies.
I have a guess though - Might it have something to do with me (a check
on my behalf) telnetting to the serviceport to check the backend.health?
Is my perl-fu not strong enough?
After the child is dead, I do not see the alarm until 10 mins later ...
and startup varnishd - I have no varnishlog from the incident but as I
start up varnish everything works again...
Every 5 minutes my nagios does four checks on the varnishmachines, to
check health of the backend.
command[check_varnish_backends]=/usr/lib/nagios/plugins/check_varnish_backend_health
-all
command[check_varnish_backend_euros08]=/usr/lib/nagios/plugins/check_varnish_backend_health
-h euros08
command[check_varnish_backend_euros12]=/usr/lib/nagios/plugins/check_varnish_backend_health
-h euros12
command[check_varnish_backend_carbon]=/usr/lib/nagios/plugins/check_varnish_backend_health
-h carbon
I've inlined the .pl doing the nagioscheck and the syslog output from
varnishd.
Does anyone have a clue ? Sorry in advance for the rathe long mail ..
Regards
/Eivind
# varnishd -V
varnishd (varnish-2.0.6)
Copyright (c) 2006-2009 Linpro AS / Verdens Gang A
______________________________________
SYSLOG:
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 open from telnet
127.0.0.1:46759 127.0.0.1:6082
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 result 200 "debug.health"
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 closed
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 open from telnet
127.0.0.1:46761 127.0.0.1:6082
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 result 200 "debug.health"
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 closed
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 open from telnet
127.0.0.1:46762 127.0.0.1:6082
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 result 200 "debug.health"
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 closed
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 open from telnet
127.0.0.1:46764 127.0.0.1:6082
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 result 200 "debug.health"
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 closed
....
May 12 10:07:58 varnish1 varnishd[19448]: Child (19449) not responding
to ping, killing it.
May 12 10:08:00 varnish1 varnishd[19448]: Child (19449) not responding
to ping, killing it.
May 12 10:08:00 varnish1 varnishd[19448]: Child (19449) died signal=3
May 12 10:08:02 varnish1 varnishd[19448]: Child cleanup complete
May 12 10:08:02 varnish1 varnishd[19448]: child (18174) Started
May 12 10:08:04 varnish1 kernel: [6214551.451581] Inbound IN=eth0 OUT=
MAC=00:14:22:1e:ba:b8:00:0e:84:3b:ec:c0:08:00 SRC=66.151.125.8
DST=130.225.244.83 LEN=32 TOS=0x00 PREC=0x00 TTL=1 ID=1802 PROTO=UDP
SPT=11057 DPT=33442 LEN=12
May 12 10:08:06 varnish1 kernel: [6214553.449867] Inbound IN=eth0 OUT=
MAC=00:14:22:1e:ba:b8:00:0e:84:3b:ec:c0:08:00 SRC=66.151.125.8
DST=130.225.244.83 LEN=32 TOS=0x00 PREC=0x00 TTL=1 ID=1802 PROTO=UDP
SPT=11057 DPT=33442 LEN=12
May 12 10:08:07 varnish1 varnishd[19448]: Pushing vcls failed: CLI
communication error
May 12 10:08:07 varnish1 varnishd[19448]: Stopping Child
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Closed fds:
4 5 6 10 11 13 14
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Child starts
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said managed to
mmap 2147483648 bytes of 2147483648
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Ready
....
May 12 10:08:07 varnish1 varnishd[19448]: Pushing vcls failed: CLI
communication error
May 12 10:08:07 varnish1 varnishd[19448]: Stopping Child
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Closed fds:
4 5 6 10 11 13 14
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Child starts
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said managed to
mmap 2147483648 bytes of 2147483648
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Ready
....
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Probe("GET
/vandstand/servlet/ImageServlet HTTP/1.1
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Host:
carbon.dmi.dk
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Connection:
close
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: last message repeated 2 times
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said ", 3, 10)
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Probe("GET
/ HTTP/1.1
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Host: localhost
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Connection:
close
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: last message repeated 2 times
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said ", 3, 5)
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Probe("GET
/dmi/dmi.css HTTP/1.1
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Host:
euros08.dmi.dk
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Connection:
close
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: last message repeated 2 times
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said ", 2, 10)
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) ended
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said Probe("GET
/dmi/dmi.css HTTP/1.1
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said Host:
euros12.dmi.dk
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said Connection: close
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said
May 12 10:08:08 varnish1 varnishd[19448]: last message repeated 2 times
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said ", 2, 10)
May 12 10:08:08 varnish1 varnishd[19448]: Child cleanup complete
....
May 12 10:21:42 varnish1 varnishd[19448]: Manager got SIGINT
May 12 10:21:43 varnish1 varnishd[19184]: child (19185) Started
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Closed fds:
4 5 6 10 11 13 14
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Child starts
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said managed to
mmap 2147483648 bytes of 2147483648
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Ready
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Probe("GET
/vandstand/servlet/ImageServlet HTTP/1.1
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Host:
carbon.dmi.dk
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Connection:
close
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said
...And so on....
_________________________________
# cat /usr/lib/nagios/plugins/check_varnish_backend_health
#!/usr/bin/perl
use strict;
use Net::Telnet ();
use Getopt::Long;
my $version = '';
my $help = '';
my $result = '';
my $status = '';
my $host = '';
my $all = '';
my $verbose = '';
my $handle = new Net::Telnet ( Port=> '6082', Timeout => 8, prompt =>
'/\.\n|\n\n/' );
## Print Help
sub usage {
my $msg = shift;
if (defined $msg) {
print "$msg\n";
}
print << "END";
Usage: check_varnish_backend_health option
Options:
--help
-h|--host hostname
--all
--verbose
--version
NB! Only one option at the time! :-)
END
exit(3);
}
#Get options from command line
$result = GetOptions(
"host|h=s" => \$host,
"all" => \$all,
"version" => \$version,
"verbose" => \$verbose,
"help" => \$help
);
if ($help) {usage();}
if ($version) {usage('Version is 0.9_EBE');}
if ($all) {if ($host){usage('Either --all _or_ --host hostname')};}
if ($verbose) {if ($host){usage('Either --verbose _or_ --host
hostname')};}
my $ip = "127.0.0.1";
$handle->open("$ip");
my @lines = $handle->cmd("debug\.health");
$handle->close;
my %backends_state_hash;
my @sick_backends;
foreach (@lines) {
if (/Backend/) {
my @line = split(' ',$_);
$status = pop(@line);
my $be = @line[1];
if ($verbose) {
print "Host: $be => $status\n";
}
if ($status eq 'Sick'){
push(@sick_backends,$be);
}
$backends_state_hash{$be} = $status;
}
}
if ($host) {
if ( $backends_state_hash{$host} eq 'Healthy'){
print "OK: Host $host is Healthy";
exit 0;
}
elsif ( $backends_state_hash{$host} eq 'Sick'){
print "Critical: Host $host is Sick";
exit 2;
}
else {
print "Warning: Host $host is not Healthy, nor is it Sick";
exit 1;
}
}
my $number_of_sick_backends=@sick_backends;
my $number_of_backends = keys %backends_state_hash;
if ( $number_of_backends == 0 ) {
print "Warning: No backends are probed";
exit 1;
}
if ( $number_of_sick_backends == 0 ) {
print "OK: All hosts are Healthy";
exit 0;
}
if ( $number_of_sick_backends == $number_of_backends ) {
print "Critical: All hosts are Sick - @sick_backends";
exit 2;
}
else {
print "Warning: Some hosts are sick: @sick_backends";
exit 1;
}
______________________________________
--
Eivind Bengtsson
Systemadministrator - Cand.merc.(dat)
Danmarks Meteorologiske Institut
Lyngbyvej 100
2100 København Ø
Direkte tlf. : 39157544
Email: ebe at dmi.dk
echo 'This is not a pipe.' | cat -> /dev/tty
More information about the varnish-misc
mailing list