web backend is sick

Tue May 13 03:56:18 CEST 2014

Hey Jason,

Really sorry I have no idea how this happened. But it must have been some
very strange copy-paste error that caused what you saw before. In reality I
did a wget to web2 and got a response from web2.

I grabbed the source text that I created before and checked it for
inconstancies and this is what that post should have looked like.

Would you or someone on the list care to read over it and try to offer a
solution? I must admit I'm tapped for ideas here.

But before we retread the following (corrected) info, I've provided a wget
to web1 (the good back end) and web2 (the bad back end) by IP:

Starting with the bad (sick) back end which is web2 by IP:

[root at varnish1:~] #wget -O /dev/null -S http://10.10.1.98/favicon.ico
--2014-05-12 21:50:55--  http://166.78.8.98/favicon.ico
Connecting to 166.78.8.98:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Tue, 13 May 2014 01:50:56 GMT
  Server: Apache/2.2.23 (CentOS)
  Last-Modified: Sat, 10 May 2014 05:06:09 GMT
  ETag: "1c5461-47e-4f904ac13b240"
  Accept-Ranges: bytes
  Content-Length: 1150
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/plain; charset=UTF-8
Length: 1150 (1.1K) [text/plain]
Saving to: “/dev/null”

100%[=========================================================================================>]
1,150       --.-K/s   in 0s

2014-05-12 21:50:55 (79.8 MB/s) - “/dev/null” saved [1150/1150]

And now the same thing to the 'healthy' web node (web1) by IP:

[root at varnish1:~] #wget -O /dev/null -S http://162.243.109.94/favicon.ico
--2014-05-12 21:54:06--  http://162.243.109.94/favicon.ico
Connecting to 162.243.109.94:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Tue, 13 May 2014 01:54:05 GMT
  Server: Apache/2.2.15 (CentOS)
  Last-Modified: Wed, 05 Mar 2014 19:27:01 GMT
  ETag: "e123b-47e-4f3e1013feb40"
  Accept-Ranges: bytes
  Content-Length: 1150
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: image/vnd.microsoft.icon
Length: 1150 (1.1K) [image/vnd.microsoft.icon]
Saving to: “/dev/null”

100%[=========================================================================================>]
1,150       --.-K/s   in 0s

2014-05-12 21:54:06 (149 MB/s) - “/dev/null” saved [1150/1150]

And now here is the original (but corrected) version of my previous post! :)

Back End node is Sick

I am using Varnish to load balance a couple of web nodes using the random
load balancing scheme.

I took a look at apache on the host that Varnish reports as sick:

And apache seems to be running fine on the 'sick' host:

[root at beta:/var/www/jf-current] #apachectl -S

VirtualHost configuration:

wildcard NameVirtualHosts and _default_ servers:

*:443                  is a NameVirtualHost

         default server beta.mywebsite.com
 (/etc/httpd/conf.d/002_jf_beta_ssl.conf:78)

         port 443 namevhost beta.mywebsite.com
 (/etc/httpd/conf.d/002_jf_beta_ssl.conf:78)

*:80                   is a NameVirtualHost

         default server ref.mywebsite.com
 (/etc/httpd/conf.d/001_ref.mywebsite.com.conf:1)

         port 80 namevhost ref.mywebsite.com
 (/etc/httpd/conf.d/001_ref.mywebsite.com.conf:1)

         port 80 namevhost qa.mywebsite.com
 (/etc/httpd/conf.d/002_qa.mywebsite.com.conf:1)

         port 80 namevhost beta.mywebsite.com
 (/etc/httpd/conf.d/003_beta.mywebsite.com.conf:1)

         port 80 namevhost beta-test.mywebsite.com
 (/etc/httpd/conf.d/004_beta-test.mywebsite.com.conf:1)

         port 80 namevhost admin.mywebsite.com
 (/etc/httpd/conf.d/005_admin.mywebsite.com.conf:1)

         port 80 namevhost admin.mywebsite.com
 (/etc/httpd/conf.d/10_admin.mywebsite.com.conf:1)

         port 80 namevhost beta-test.mywebsite.com
 (/etc/httpd/conf.d/10_beta-test.mywebsite.com.conf:1)

         port 80 namevhost beta.mywebsite.com
 (/etc/httpd/conf.d/10_beta.mywebsite.com.conf:1)

         port 80 namevhost qa.mywebsite.com
 (/etc/httpd/conf.d/10_qa.mywebsite.com.conf:1)

         port 80 namevhost ref.mywebsite.com
 (/etc/httpd/conf.d/10_ref.mywebsite.com.conf:1)

Syntax OK

 My probe and host definitions in the varnish default.vcl looks like this:

probe favicon {

  .url = "/favicon.ico";

  .timeout = 34ms;

  .interval = 1s;

  .window = 10;

  .threshold = 8;

}

backend web1  {

  .host = "10.10.1.94";

  .port = "80";

  .probe = favicon;

}

backend web2  {

  .host = "10.10.1.98";

  .port = "80";

  .probe = favicon;

}

director www random {

  { .backend = web1 ; .weight = 2;  }

  { .backend = web2 ; .weight = 2;

 }

>From the varnish host I can do a wget for the probe file from the host that
varnish is marking as 'sick'

[root at varnish1:/etc/varnish] #wget -O /dev/null -S
http://web2.mywebsite.com/favicon.ico

--2014-05-11 20:48:23--
http://web2.mywebsite.com/favicon.ico<http://web1.mywebsite.com/favicon.ico>

Resolving web2.mywebsite.com... 10.10.1.98

Connecting to web2.mywebsite.com <http://web1.mywebsite.com/>|10.10.1.98|:80...
connected.

HTTP request sent, awaiting response...

  HTTP/1.1 200 OK

  Date: Mon, 12 May 2014 00:48:25 GMT

  Server: Apache/2.2.23 (CentOS)

  Last-Modified: Sun, 22 Dec 2013 00:53:19 GMT

  ETag: "2a8003-47e-4ee14efeebdc0"

  Accept-Ranges: bytes

  Content-Length: 1150

  Keep-Alive: timeout=5, max=100

  Connection: Keep-Alive

  Content-Type: text/plain; charset=UTF-8

Length: 1150 (1.1K) [text/plain]

Saving to: “/dev/null”

100%[================================================================>]
1,150       --.-K/s   in 0s

2014-05-11 20:48:24 (144 MB/s) - “/dev/null” saved [1150/1150]

I'm attempting here to approximate the calls that varnish is making to
determine if a host is healthy.

And yet when I try having a look at debug.health from the command line
Vaarnish is still labeling it as 'sick'

[root at varnish1:~] #varnishadm -T 127.0.0.1:6082 debug.health -S
/etc/varnish/secret

*Backend web1 is Healthy*

Current states  good: 10 threshold:  8 window: 10

Average responsetime of good probes: 0.001247

Oldest                                                    Newest

================================================================

4444444444444444444444444444444444444444444444444444444444444444 Good IPv4

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit

RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy

*Backend web2 is Sick*

Current states  good:  0 threshold:  8 window: 10

Average responsetime of good probes: 0.000000

Oldest                                                    Newest

================================================================

---------------------------------------------------------------- Happy

So I'm really wondering if I'm missing something here that'll help me to
determine why Varnish thinks this host is sick! Also I have no problem
browsing the custom URLs that I setup on each host to indicate which host
apache is running on.

 I would definitely appreciate any clues on how I could handle this.

Thanks
Tim

On Mon, May 12, 2014 at 1:37 AM, Jason Woods <devel at jasonwoods.me.uk> wrote:

> Hi
>
> On 12 May 2014, at 01:57, Tim Dunphy <bluethundr at gmail.com> wrote:
>
> I took a look at apache on the host that Varnish reports as sick:
>
> And apache seems to be running fine on the 'sick' host:
>
> [root at beta:/var/www/jf-current] #apachectl -S
>
> VirtualHost configuration:
>
> wildcard NameVirtualHosts and _default_ servers:
>
> *:443                  is a NameVirtualHost
>
>          default server beta.mywebsite.com(/etc/httpd/conf.d/002_jf_beta_ssl.conf:78)
>
>          port 443 namevhost beta.mywebsite.com(/etc/httpd/conf.d/002_jf_beta_ssl.conf:78)
>
> *:80                   is a NameVirtualHost
>
>          default server ref.mywebsite.com(/etc/httpd/conf.d/001_ref.mywebsite.com.conf:1)
>
>          port 80 namevhost ref.mywebsite.com(/etc/httpd/conf.d/001_ref.mywebsite.com.conf:1)
>
>          port 80 namevhost qa.mywebsite.com(/etc/httpd/conf.d/002_qa.mywebsite.com.conf:1)
>
>          port 80 namevhost beta.mywebsite.com(/etc/httpd/conf.d/003_beta.mywebsite.com.conf:1)
>
>          port 80 namevhost beta-test.mywebsite.com(/etc/httpd/conf.d/004_beta-test.mywebsite.com.conf:1)
>
>          port 80 namevhost admin.mywebsite.com(/etc/httpd/conf.d/005_admin.mywebsite.com.conf:1)
>
>          port 80 namevhost admin.mywebsite.com(/etc/httpd/conf.d/10_admin.mywebsite.com.conf:1)
>
>          port 80 namevhost beta-test.mywebsite.com(/etc/httpd/conf.d/10_beta-test.mywebsite.com.conf:1)
>
>          port 80 namevhost beta.mywebsite.com(/etc/httpd/conf.d/10_beta.mywebsite.com.conf:1)
>
>          port 80 namevhost qa.mywebsite.com(/etc/httpd/conf.d/10_qa.mywebsite.com.conf:1)
>
>          port 80 namevhost ref.mywebsite.com(/etc/httpd/conf.d/10_ref.mywebsite.com.conf:1)
>
> Syntax OK
>
>
> My probe and host definitions in the varnish default.vcl looks like this:
>
>
> probe favicon {
>
>   .url = "/favicon.ico";
>
>   .timeout = 34ms;
>
>   .interval = 1s;
>
>   .window = 10;
>
>   .threshold = 8;
>
> }
>
>
> backend web1  {
>
>   .host = "10.10.1.94";
>
>   .port = "80";
>
>   .probe = favicon;
>
> }
>
>
> backend web2  {
>
>
>   .host = "10.10.1.98";
>
>   .port = "80";
>
>   .probe = favicon;
>
> }
>
> director www random {
>
>   { .backend = web1 ; .weight = 2;  }
>
>   { .backend = web2 ; .weight = 2;  }
>
>  }
> I was just experimenting the load balancing algorithms, but that probably
> doesn't have much bearing on this problem.
>
> From the varnish host I can do a wget for the probe file from the host
> that varnish is marking as 'sick'
>
>
> [root at varnish1:/etc/varnish] #wget -O /dev/null -S
> http://web2.mywebsite.com/favicon.ico
>
> --2014-05-11 20:48:23--  http://web1.mywebsite.com/favicon.ico
>
>
> You request web2 but it resolved web1? This may be related to the cause.
>
> Can you wget both web1 and web2? Using IP too? (If I'm honest I'm not sure
> what Host header Varnish will request with.)
>
> Jason
>
> Resolving web1.mywebsite.com... 10.10.1.98
>
> Connecting to web1.mywebsite.com|10.10.1.98|:80... connected.
>
> HTTP request sent, awaiting response...
>
>   HTTP/1.1 200 OK
>
>   Date: Mon, 12 May 2014 00:48:25 GMT
>
>   Server: Apache/2.2.23 (CentOS)
>
>   Last-Modified: Sun, 22 Dec 2013 00:53:19 GMT
>
>   ETag: "2a8003-47e-4ee14efeebdc0"
>
>   Accept-Ranges: bytes
>
>   Content-Length: 1150
>
>   Keep-Alive: timeout=5, max=100
>
>   Connection: Keep-Alive
>
>   Content-Type: text/plain; charset=UTF-8
>
> Length: 1150 (1.1K) [text/plain]
>
> Saving to: “/dev/null”
>
> 100%[================================================================>]
> 1,150       --.-K/s   in 0s
>
> 2014-05-11 20:48:24 (144 MB/s) - “/dev/null” saved [1150/1150]
>
>
> I'm attempting here to approximate the calls that varnish is making to
> determine if a host is healthy.
>
> And yet when I try having a look at debug.health from the command line
> Vaarnish is still labeling it as 'sick'
>
> [root at varnish1:~] #varnishadm -T 127.0.0.1:6082 debug.health -S
> /etc/varnish/secret
>
> *Backend web1 is Healthy*
>
> Current states  good: 10 threshold:  8 window: 10
>
> Average responsetime of good probes: 0.001247
>
> Oldest                                                    Newest
>
> ================================================================
>
> 4444444444444444444444444444444444444444444444444444444444444444 Good IPv4
>
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit
>
> RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR Good Recv
>
> HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH Happy
>
> *Backend web2 is Sick*
>
> Current states  good:  0 threshold:  8 window: 10
>
> Average responsetime of good probes: 0.000000
>
> Oldest                                                    Newest
>
> ================================================================
>
> ---------------------------------------------------------------- Happy
>
> So I'm really wondering if I'm missing something here that'll help me to
> determine why Varnish thinks this host is sick! Also I have no problem
> browsing the custom URLs that I setup on each host to indicate which host
> apache is running on.
>
>
> Thanks
>
> Tim
>
>
>
> On Sat, May 10, 2014 at 12:58 AM, Tim Dunphy <bluethundr at gmail.com> wrote:
>
>> Hey all,
>>
>>  I have two web backends in my varnish config. And one node is reporting
>> healthy and the other is being reported as 'sick'.
>>
>> 10 Backend      c 11 www web1
>>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001130 0.001067
>> HTTP/1.1 200 OK
>>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001231 0.001108
>> HTTP/1.1 200 OK
>>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001250 0.001143
>> HTTP/1.1 200 OK
>>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001127 0.001139
>> HTTP/1.1 200 OK
>>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001208 0.001157
>> HTTP/1.1 200 OK
>>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001562 0.001258
>> HTTP/1.1 200 OK
>>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001545 0.001330
>> HTTP/1.1 200 OK
>>     0 Backend_health - web1 Still healthy 4--X-RH 5 3 5 0.001363 0.001338
>> HTTP/1.1 200 OK
>>    11 BackendClose b web1
>>
>> [root at varnish1:/etc/varnish] #varnishlog | grep web2
>>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>>     0 Backend_health - web2 Still sick 4--X--- 0 3 5 0.000000 0.000000
>>
>> And I'm really at a loss to understand why. Both nodes should be
>> completely identical. And the web roots on both are basically svn repos
>> that are in sync.
>>
>> From web1 :
>>
>> [root at beta:/var/www/jf-current] #svn info | grep -i revision
>> Revision: 17
>>
>> To web2:
>>
>> [root at beta-new:/var/www/jf-current] #svn info | grep -i revision
>> Revision: 17
>>
>> This is the part of my vcl file where I define the web back ends:
>>
>> probe favicon {
>>   .url = "/favicon.ico";
>>   .timeout = 60ms;
>>   .interval = 2s;
>>   .window = 5;
>>   .threshold = 3;
>> }
>>
>> backend web1  {
>>   .host = "xx.xx.xx.xx";
>>   .port = "80";
>>   .probe = favicon;
>> }
>>
>> backend web2  {
>>
>>   .host = "xx.xx.xx.xx";
>>   .port = "80";
>>   .probe = favicon;
>> }
>>
>> And the file that varnish is probing for is present on both:
>>
>> [root at beta:/var/www/jf-current] #ls -l /var/www/jf-current/favicon.ico
>> -rwxrwxr-x 1 apache ftp 1150 Dec 22 00:53 /var/www/jf-current/favicon.ico
>>
>> I've also setup individual web URLs for each host that isn't cached in
>> varnish so I can hit each one. And each site comes up ok. So I'm a little
>> puzzled as to why the second web host is reporting 'sick' and what I can do
>> to get it back into load balancing.
>>
>> Thanks for any help you can provide!
>>
>> Tim
>>
>>
>>
>>
>>
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>  _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
>

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20140512/4d0ab0a7/attachment-0001.html>