503 service unavailable error

Jason Price japrice at gmail.com
Thu Jul 9 15:01:17 CEST 2015


You're never specifying any auth in your probe:

  .probe = {
  .request =
   "GET /healthcheck.php HTTP/1.1"
   "Host: wiki.example.com"
   "Connection: close";

I don't know the proper way to specify it, but you'll need to play
around with curl, wireshark and varnish probes until you get it right.

May be easier to test with telnet invocations:

telnet 10.10.10.26 80
GET /healthcheck.php HTTP/1.1
Host: wiki.example.com
Authorization: Basic ???????????????
Connection: close


The above should give you an auth failure request.  Twiddle with that
until you get a successful authentication request, then translate it
into the probe .request format.  The link you provided gives you
everything else you need.

-Jason

On Wed, Jul 8, 2015 at 11:19 PM, Tim Dunphy <bluethundr at gmail.com> wrote:
>> that interval and window on your web server is scary..... what you're
>> saying is 'check each web server every 10 minutes, and only fail it
>> after 3 failures'
>
>
> Hah!! Agreed. I was just trying to rule the connect timeouts out of the
> picture as to why the failures were happening!
> I plan to set them to more normal intervals once I'm finished testing and
> I've been able to get this to work.
>
>>
>>
>> next time you see the issue, look at:
>> varnishadm -n <varnish_name> debug.health
>
>
> Hmm you may have a point as to the back ends. Varnish is indeed seeing them
> as 'sick' when I encounter the 503 error:
>
>
> [root at varnish1:~] #varnishadm -n  varnish1   debug.health
> Backend web1 is Sick
> Current states  good:  0 threshold:  2 window:  3
> Average responsetime of good probes: 0.000000
> Oldest                                                    Newest
> ================================================================
> ------------------------------------------------------4444444444 Good IPv4
> ------------------------------------------------------XXXXXXXXXX Good Xmit
> ------------------------------------------------------RRRRRRRRRR Good Recv
> ----------------------------------------------------HH---------- Happy
> Backend web2 is Sick
> Current states  good:  0 threshold:  2 window:  3
> Average responsetime of good probes: 0.000000
> Oldest                                                    Newest
> ================================================================
> ------------------------------------------------------4444444444 Good IPv4
> ------------------------------------------------------XXXXXXXXXX Good Xmit
> ------------------------------------------------------RRRRRRRRRR Good Recv
> ----------------------------------------------------HH---------- Happy
>
>>
>>
>> I'd be willing to bet that varnish is just failing the backends.  Try
>> running the healthcheck manually from the varnish boxes:
>> curl -H "Host:kiki.example.com" -v "http://10.10.10.26/healthcheck.php"
>> And see if you're actually getting good healthchecks.  If you're not,
>> then you need to look at your backends (specifically healthcheck.php)
>
>
> But if I perform the curl you're suggesting, I am able to retrieve the
> healthcheck.php file!!
>
> #curl --user admin:somepass -H "Host:wiki.example.com" -v
> "http://10.10.10.25/healthcheck.php"
> * About to connect() to 52.5.117.61 port 80 (#0)
> *   Trying 52.5.117.61... connected
> * Connected to 52.5.117.61 (52.5.117.61) port 80 (#0)
> * Server auth using Basic with user 'admin'
>> GET /healthcheck.php HTTP/1.1
>> Authorization: Basic SomeBase64Hash==
>> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7
>> NSS/3.14.0.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
>> Accept: */*
>> Host:wiki.example.com
>>
> < HTTP/1.1 200 OK
> < Date: Thu, 09 Jul 2015 02:10:35 GMT
> < Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9
> PHP/5.4.42 SVN/1.7.14 mod_wsgi/3.4 Python/2.7.5
> < X-Powered-By: PHP/5.4.42
> < Content-Length: 5
> < Content-Type: text/html; charset=UTF-8
> <
> good
> * Connection #0 to host 52.5.117.61 left intact
> * Closing connection #0
>
> But in the curl I just did I was specifying the user auth. Which got me to
> thinking, maybe I'm handing apache basic auth in the wrong way in my VCL
> file?
>
> To test this idea out, I commented out the basic auth lines in my apache
> config. Then cycled the services on both apache servers and both varnish
> servers.
>
> When I ran the test you gave me again, this is the result I got back:
>
> #varnishadm -n  varnish1   debug.health
> Backend web1 is Healthy
> Current states  good:  3 threshold:  2 window:  3
> Average responsetime of good probes: 0.032781
> Oldest                                                    Newest
> ================================================================
> ---------------------------------------------------------------4 Good IPv4
> ---------------------------------------------------------------X Good Xmit
> ---------------------------------------------------------------R Good Recv
> -------------------------------------------------------------HHH Happy
> Backend web2 is Healthy
> Current states  good:  3 threshold:  2 window:  3
> Average responsetime of good probes: 0.032889
> Oldest                                                    Newest
> ================================================================
> ---------------------------------------------------------------4 Good IPv4
> ---------------------------------------------------------------X Good Xmit
> ---------------------------------------------------------------R Good Recv
> -------------------------------------------------------------HHH Happy
>
> Everbody's happy again!!
>
> And I tried browsing around the wiki for quite a long time. And there were
> NO 503 errors the entire time I was using it. Which tells me that I am,
> indeed, not handling auth correctly in my VCL.
>
> The way I thought I solved the problem was by adding a .request to the web
> server definitions that specified the headers to do a GET on the health
> check:
>
> .request =
>    "GET /healthcheck.php HTTP/1.1"
>    "Host: wiki.example.com"
>    "Connection: close";
>
> The reason I thought this worked was because, after I'd restarted varnish
> with that change in place I was able to log into the wiki with basic auth in
> the web browser. And then I'd be able to use it for a while before the
> back-end would come up as 'sick' in varnish again which would cause the 503
> error.
>
> I then tried following this advice again, which I had also tried earlier
> without much luck:
>
> http://blog.tenya.me/blog/2011/12/14/varnish-http-authentication/
>
> Which tells you to add this section to your VCL file:
>
>  if (! req.http.Authorization ~ "Basic SomeBase64Hash==")
>       {
>        error 401 "Restricted";
>       }
>
> And then add this sub_vcl section:
>
> sub vcl_error {
>
>   if (obj.status == 401) {
>   set obj.http.Content-Type = "text/html; charset=utf-8";
>   set obj.http.WWW-Authenticate = "Basic realm=Secured";
>   synthetic {"
>
>    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
> "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
>
>     <HTML>
>     <HEAD>
>     <TITLE>Error</TITLE>
>     <META HTTP-EQUIV='Content-Type' CONTENT='text/html;'>
>     </HEAD>
>     <BODY><H1>401 Unauthorized (varnish)</H1></BODY>
>     </HTML>
>     "};
>      return (deliver);
>     }
> }
>
> And after restarting varnish again on both nodes, with authentication in
> place in the VHOST configs on the web servers I was able to log into the
> wiki site again and browse around for a while.
>
> But then after some browsing around the back ends would go sick again and
> you would see the 503:
>
> #varnishadm -n  varnish1   debug.health
> Backend web1 is Sick
> Current states  good:  1 threshold:  2 window:  3
> Average responsetime of good probes: 0.000000
> Oldest                                                    Newest
> ================================================================
> --------------------------------------------------------------44 Good IPv4
> --------------------------------------------------------------XX Good Xmit
> --------------------------------------------------------------RR Good Recv
> ------------------------------------------------------------HH-- Happy
> Backend web2 is Sick
> Current states  good:  1 threshold:  2 window:  3
> Average responsetime of good probes: 0.000000
> Oldest                                                    Newest
> ================================================================
> --------------------------------------------------------------44 Good IPv4
> --------------------------------------------------------------XX Good Xmit
> --------------------------------------------------------------RR Good Recv
> ------------------------------------------------------------HH-- Happy
>
> So SOMETHING must still be off with how I'm handling authentication in my
> VCL config. The next step I'm thinking of trying involves passing the
> authentication headers to the .request section of my web server definition.
> Although I'm not sure if it'll work. I'll let you guys know if it does.
>
> But I'd like to present the current state of my VLC again in case anyone has
> any insight or knowledge to share that may help.
>
> backend web1 {
>
>   .host = "10.10.10.25";
>
>   .port = "80";
>
>   .connect_timeout = 3600s;
>
>   .first_byte_timeout = 3600s;
>
>   .between_bytes_timeout = 3600s;
>
>   .max_connections = 70;
>
>   .probe = {
>
>   .request =
>
>    "GET /healthcheck.php HTTP/1.1"
>
>    "Host: wiki.example.com"
>
>    "Connection: close";
>
>    .interval = 10m;
>
>    .timeout = 60s;
>
>    .window = 3;
>
>    .threshold = 2;
>
>    }
>
> }
>
> backend web2 {
>
>   .host = "10.10.10.26";
>
>   .port = "80";
>
>   .connect_timeout = 3600s;
>
>   .first_byte_timeout = 3600s;
>
>   .between_bytes_timeout = 3600s;
>
>   .max_connections = 70;
>
>   .probe = {
>
>   .request =
>
>    "GET /healthcheck.php HTTP/1.1"
>
>    "Host: wiki.example.com"
>
>    "Connection: close";
>
>    .interval = 10m;
>
>    .timeout = 60s;
>
>    .window = 3;
>
>    .threshold = 2;
>
>    }
>
> }
>
> director www round-robin {
>
>   { .backend = web1;   }
>
>   { .backend = web2;  }
>
>  }
>
> sub vcl_recv {
>
>      if (! req.http.Authorization ~ "Basic Base64Hash==")
>
>       {
>
>        error 401 "Restricted";
>
>       }
>
>     if (req.url ~ "&action=submit($|/)") {
>
>         return (pass);
>
>     }
>
>     set req.backend = www;
>
>     return (lookup);
>
> }
>
> sub vcl_fetch {
>
>       set beresp.ttl = 3600s;
>
>       set beresp.grace = 4h;
>
>       return (deliver);
>
> }
>
> sub vcl_error {
>
>   if (obj.status == 401) {
>
>   set obj.http.Content-Type = "text/html; charset=utf-8";
>
>   set obj.http.WWW-Authenticate = "Basic realm=Secured";
>
>   synthetic {"
>
>
>    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
> "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
>
>
>     <HTML>
>
>     <HEAD>
>
>     <TITLE>Error</TITLE>
>
>     <META HTTP-EQUIV='Content-Type' CONTENT='text/html;'>
>
>     </HEAD>
>
>     <BODY><H1>401 Unauthorized (varnish)</H1></BODY>
>
>     </HTML>
>
>     "};
>
>      return (deliver);
>
>     }
>
> }
>
> sub vcl_deliver {
>
>      if (obj.hits> 0) {
>
>       set resp.http.X-Cache = "HIT";
>
>      } else {
>
>         set resp.http.X-Cache = "MISS";
>
>      }
>
>  }
>
> Once again I genuinely appreciate the help of this list, and hope I haven't
> worn out my welcome! ;)
>
> Thanks,
> Tim
>
>
> On Wed, Jul 8, 2015 at 9:31 PM, Jason Price <japrice at gmail.com> wrote:
>>
>> that interval and window on your web server is scary..... what you're
>> saying is 'check each web server every 10 minutes, and only fail it
>> after 3 failures'
>>
>> next time you see the issue, look at:
>>
>> varnishadm -n <varnish_name> debug.health
>>
>> I'd be willing to bet that varnish is just failing the backends.  Try
>> running the healthcheck manually from the varnish boxes:
>>
>> curl -H "Host:kiki.example.com" -v "http://10.10.10.26/healthcheck.php"
>>
>> And see if you're actually getting good healthchecks.  If you're not,
>> then you need to look at your backends (specifically healthcheck.php)
>>
>> On Wed, Jul 8, 2015 at 12:14 PM, Tim Dunphy <bluethundr at gmail.com> wrote:
>> > Hi guys,
>> >
>> >
>> >  I'm having an issue where my varnish server will stop working after a
>> > while
>> > of browsing around the site I'm using it with and throw a 503 server
>> > unavailable error.
>> >
>> > In my varnish logs I'm getting a 'no backend connection error':
>> >
>> >    10 FetchError   c no backend connection
>> >    10 VCL_call     c error deliver
>> >    10 VCL_call     c deliver deliver
>> >    10 TxProtocol   c HTTP/1.1
>> >    10 TxStatus     c 503
>> >    10 TxResponse   c Service Unavailable
>> >    10 TxHeader     c Server: Varnish
>> >
>> >
>> > And if I do a GET on the healthcheck from the command line on the
>> > varnish
>> > server, I get a 503 response from varnish:
>> >
>> > #GET http://wiki.example.com/healthcheck.php
>> >
>> > <?xml version="1.0" encoding="utf-8"?>
>> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>> >  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
>> > <html>
>> >   <head>
>> >     <title>503 Service Unavailable</title>
>> >   </head>
>> >   <body>
>> >     <h1>Error 503 Service Unavailable</h1>
>> >     <p>Service Unavailable</p>
>> >     <h3>Guru Meditation:</h3>
>> >     <p>XID: 2107225059</p>
>> >     <hr>
>> >     <p>Varnish cache server</p>
>> >   </body>
>> > </html>
>> >
>> > But if I do another GET on the healthcheck file from the varnish server
>> > to
>> > another apache VHOST on the same server as the wiki site that responds
>> > to
>> > the IP of the web server instead of the IP for the varnish server, the
>> > GET
>> > works:
>> >
>> > #GET http://ops1.example.com/healthcheck.php
>> > good
>> >
>> >
>> > So I'm not sure why varnish is having trouble reaching the HC file. The
>> > web
>> > server is a little far from the varnish server. The varnish machines are
>> > in
>> > NYC and the web servers are in northern Virginia.
>> >
>> > So I tried setting the timeouts in the varnish config to a really high
>> > number. And that was working for a while. But today I noticed that it
>> > stopped working. I'll have to restart the varnish service and browse the
>> > site for a while. Then it'll stop working again and produce the 503
>> > error.
>> > It's pretty annoying!
>> >
>> > I was wondering if there might be something in my VCL I could tweak to
>> > make
>> > this work? Or if the fact is that the web servers are simply too far
>> > from
>> > varnish for this to be practical.
>> >
>> > Here's my VCL file. It's pretty basic:
>> >
>> > backend web1 {
>> >   .host = "10.10.10.25";
>> >   .port = "80";
>> >   .connect_timeout = 1200s;
>> >   .first_byte_timeout = 1200s;
>> >   .between_bytes_timeout = 1200s;
>> >   .max_connections = 70;
>> >   .probe = {
>> >   .request =
>> >    "GET /healthcheck.php HTTP/1.1"
>> >    "Host: wiki.example.com"
>> >    "Connection: close";
>> >    .interval = 10m;
>> >    .timeout = 60s;
>> >    .window = 3;
>> >    .threshold = 2;
>> >    }
>> > }
>> >
>> > backend web2 {
>> >   .host = "10.10.10.26";
>> >   .port = "80";
>> >   .connect_timeout = 1200s;
>> >   .first_byte_timeout = 1200s;
>> >   .between_bytes_timeout = 1200s;
>> >   .max_connections = 70;
>> >   .probe = {
>> >   .request =
>> >    "GET /healthcheck.php HTTP/1.1"
>> >    "Host: wiki.example.com"
>> >    "Connection: close";
>> >    .interval = 10m;
>> >    .timeout = 60s;
>> >    .window = 3;
>> >    .threshold = 2;
>> >    }
>> > }
>> >
>> > director www round-robin {
>> >   { .backend = web1;   }
>> >   { .backend = web2;  }
>> >  }
>> >
>> > sub vcl_recv {
>> >
>> >     if (req.url ~ "&action=submit($|/)") {
>> >         return (pass);
>> >     }
>> >
>> >     set req.backend = www;
>> >     return (lookup);
>> > }
>> >
>> > sub vcl_fetch {
>> >       set beresp.ttl = 3600s;
>> >       set beresp.grace = 4h;
>> >       return (deliver);
>> > }
>> >
>> >
>> > sub vcl_deliver {
>> >      if (obj.hits> 0) {
>> >       set resp.http.X-Cache = "HIT";
>> >      } else {
>> >         set resp.http.X-Cache = "MISS";
>> >      }
>> >  }
>> >
>> > Thanks,
>> > Tim
>> >
>> >
>> >
>> > --
>> > GPG me!!
>> >
>> > gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>> >
>> >
>> > _______________________________________________
>> > varnish-misc mailing list
>> > varnish-misc at varnish-cache.org
>> > https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
>
>
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>



More information about the varnish-misc mailing list