web backend is sick

Mon May 12 02:57:34 CEST 2014

I took a look at apache on the host that Varnish reports as sick:

And apache seems to be running fine on the 'sick' host:

[root at beta:/var/www/jf-current] #apachectl -S

VirtualHost configuration:

wildcard NameVirtualHosts and _default_ servers:

*:443                  is a NameVirtualHost

         default server
beta.mywebsite.com(/etc/httpd/conf.d/002_jf_beta_ssl.conf:78)

         port 443 namevhost
beta.mywebsite.com(/etc/httpd/conf.d/002_jf_beta_ssl.conf:78)

*:80                   is a NameVirtualHost

         default server
ref.mywebsite.com(/etc/httpd/conf.d/001_ref.mywebsite.com.conf:1)

         port 80 namevhost
ref.mywebsite.com(/etc/httpd/conf.d/001_ref.mywebsite.com.conf:1)

         port 80 namevhost
qa.mywebsite.com(/etc/httpd/conf.d/002_qa.mywebsite.com.conf:1)

         port 80 namevhost
beta.mywebsite.com(/etc/httpd/conf.d/003_beta.mywebsite.com.conf:1)

         port 80 namevhost
beta-test.mywebsite.com(/etc/httpd/conf.d/004_beta-test.mywebsite.com.conf:1)

         port 80 namevhost
admin.mywebsite.com(/etc/httpd/conf.d/005_admin.mywebsite.com.conf:1)

         port 80 namevhost
admin.mywebsite.com(/etc/httpd/conf.d/10_admin.mywebsite.com.conf:1)

         port 80 namevhost
beta-test.mywebsite.com(/etc/httpd/conf.d/10_beta-test.mywebsite.com.conf:1)

         port 80 namevhost
beta.mywebsite.com(/etc/httpd/conf.d/10_beta.mywebsite.com.conf:1)

         port 80 namevhost
qa.mywebsite.com(/etc/httpd/conf.d/10_qa.mywebsite.com.conf:1)

         port 80 namevhost
ref.mywebsite.com(/etc/httpd/conf.d/10_ref.mywebsite.com.conf:1)

Syntax OK

My probe and host definitions in the varnish default.vcl looks like this:

probe favicon {

  .url = "/favicon.ico";

  .timeout = 34ms;

  .interval = 1s;

  .window = 10;

  .threshold = 8;

}

backend web1  {

  .host = "10.10.1.94";

  .port = "80";

  .probe = favicon;

}

backend web2  {

  .host = "10.10.1.98";

  .port = "80";

  .probe = favicon;

}

director www random {

  { .backend = web1 ; .weight = 2;  }

  { .backend = web2 ; .weight = 2;  }

 }
I was just experimenting the load balancing algorithms, but that probably
doesn't have much bearing on this problem.

>From the varnish host I can do a wget for the probe file from the host that
varnish is marking as 'sick'

[root at varnish1:/etc/varnish] #wget -O /dev/null -S
http://web2.mywebsite.com/favicon.ico

--2014-05-11 20:48:23--  http://web1.mywebsite.com/favicon.ico

Resolving web1.mywebsite.com... 10.10.1.98

Connecting to web1.mywebsite.com|10.10.1.98|:80... connected.

HTTP request sent, awaiting response...

  HTTP/1.1 200 OK

  Date: Mon, 12 May 2014 00:48:25 GMT

  Server: Apache/2.2.23 (CentOS)

  Last-Modified: Sun, 22 Dec 2013 00:53:19 GMT

  ETag: "2a8003-47e-4ee14efeebdc0"

  Accept-Ranges: bytes

  Content-Length: 1150

  Keep-Alive: timeout=5, max=100

  Connection: Keep-Alive

  Content-Type: text/plain; charset=UTF-8

Length: 1150 (1.1K) [text/plain]

Saving to: “/dev/null”

100%[================================================================>]
1,150       --.-K/s   in 0s

2014-05-11 20:48:24 (144 MB/s) - “/dev/null” saved [1150/1150]

I'm attempting here to approximate the calls that varnish is making to
determine if a host is healthy.

And yet when I try having a look at debug.health from the command line
Vaarnish is still labeling it as 'sick'

[root at varnish1:~] #varnishadm -T 127.0.0.1:6082 debug.health -S
/etc/varnish/secret

*Backend web1 is Healthy*

Current states  good: 10 threshold:  8 window: 10

Average responsetime of good probes: 0.001247

Oldest                                                    Newest

================================================================

4444444444444444444444444444444444444444444444444444444444444444 Good IPv4

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit

RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy

*Backend web2 is Sick*

Current states  good:  0 threshold:  8 window: 10

Average responsetime of good probes: 0.000000

Oldest                                                    Newest

================================================================

---------------------------------------------------------------- Happy

So I'm really wondering if I'm missing something here that'll help me to
determine why Varnish thinks this host is sick! Also I have no problem
browsing the custom URLs that I setup on each host to indicate which host
apache is running on.

Thanks

Tim

On Sat, May 10, 2014 at 12:58 AM, Tim Dunphy <bluethundr at gmail.com> wrote:

> Hey all,
>
>  I have two web backends in my varnish config. And one node is reporting
> healthy and the other is being reported as 'sick'.
>
> 10 Backend      c 11 www web1
>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001130 0.001067
> HTTP/1.1 200 OK
>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001231 0.001108
> HTTP/1.1 200 OK
>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001250 0.001143
> HTTP/1.1 200 OK
>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001127 0.001139
> HTTP/1.1 200 OK
>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001208 0.001157
> HTTP/1.1 200 OK
>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001562 0.001258
> HTTP/1.1 200 OK
>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001545 0.001330
> HTTP/1.1 200 OK
>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001363 0.001338
> HTTP/1.1 200 OK
>    11 BackendClose b web1
>
> [root at varnish1:/etc/varnish] #varnishlog | grep web2
>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>
> And I'm really at a loss to understand why. Both nodes should be
> completely identical. And the web roots on both are basically svn repos
> that are in sync.
>
> From web1 :
>
> [root at beta:/var/www/jf-current] #svn info | grep -i revision
> Revision: 17
>
> To web2:
>
> [root at beta-new:/var/www/jf-current] #svn info | grep -i revision
> Revision: 17
>
> This is the part of my vcl file where I define the web back ends:
>
> probe favicon {
>   .url = "/favicon.ico";
>   .timeout = 60ms;
>   .interval = 2s;
>   .window = 5;
>   .threshold = 3;
> }
>
> backend web1  {
>   .host = "xx.xx.xx.xx";
>   .port = "80";
>   .probe = favicon;
> }
>
> backend web2  {
>
>   .host = "xx.xx.xx.xx";
>   .port = "80";
>   .probe = favicon;
> }
>
> And the file that varnish is probing for is present on both:
>
> [root at beta:/var/www/jf-current] #ls -l /var/www/jf-current/favicon.ico
> -rwxrwxr-x 1 apache ftp 1150 Dec 22 00:53 /var/www/jf-current/favicon.ico
>
> I've also setup individual web URLs for each host that isn't cached in
> varnish so I can hit each one. And each site comes up ok. So I'm a little
> puzzled as to why the second web host is reporting 'sick' and what I can do
> to get it back into load balancing.
>
> Thanks for any help you can provide!
>
> Tim
>
>
>
>
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20140511/920a6854/attachment.html>