<div dir="ltr">Hey Jason,<div><br></div><div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote"><span style="font-size:12.8px">You're never specifying any auth in your probe:</span><br style="font-size:12.8px"><span style="font-size:12.8px"></span><br><span style="font-size:12.8px"> .probe = {</span><br><span style="font-size:12.8px"> .request =</span><br><span style="font-size:12.8px"> "GET /healthcheck.php HTTP/1.1"</span><br><span style="font-size:12.8px"> "Host: <a href="http://wiki.example.com/" rel="noreferrer" target="_blank">wiki.example.com</a>"</span><br><span style="font-size:12.8px"> "Connection: close";</span></blockquote><div><br></div><div>Yeah, understood. Actually when I mailed yesterday that was something I was planning on doing. Not something I had done. But sometimes I'm not very clear in explaining things. <br><br></div><div>At any rate, I was able to get the Basic Auth headers into my .probe .request and the good news is it seems to have worked!!<br><br></div><div>This was the change that I made:<br><br> .request =<br> "GET /healthcheck.php HTTP/1.1"<br> "Host: <a href="http://wiki.jokefire.com">wiki.jokefire.com</a>"<br> "Authorization: Basic myBase64Hash=="<br> "Connection: close";<br></div><br><br></div><div>So after that change was made and I cycled varnish I literally NEVER got the 503 error again. Just an occasional 504 that went away on a page reload. But nothing serious. And even that could probably be done away with some VCL tweaking. <br><br></div><div>So after that success I made some modifications to the VCL to make it work a little better with mediawiki. Here's the current state of my VCL for anyone that's interested. <br><br><br>backend web1 {<br> .host = “10.10.10.25”;<br> .port = "80";<br> .connect_timeout = 3600s;<br> .first_byte_timeout = 3600s;<br> .between_bytes_timeout = 3600s;<br> .max_connections = 70;<br> .probe = {<br> .request =<br> "GET /healthcheck.php HTTP/1.1"<br> "Host: <a href="http://wiki.example.com">wiki.example.com</a>"<br> "Authorization: Basic Base64Hash=="<br> "Connection: close";<br> .interval = 10m;<br> .timeout = 60s;<br> .window = 3;<br> .threshold = 2;<br> }<br>}<br><br>backend web2 {<br> .host = “10.10.10.26”;<br> .port = "80";<br> .connect_timeout = 3600s;<br> .first_byte_timeout = 3600s;<br> .between_bytes_timeout = 3600s;<br> .max_connections = 70;<br> .probe = {<br> .request =<br> "GET /healthcheck.php HTTP/1.1"<br> "Host: <a href="http://wiki.example.com">wiki.example.com</a>"<br> "Authorization: Basic Base64Hash=="<br> "Connection: close";<br> .interval = 10m;<br> .timeout = 60s;<br> .window = 3;<br> .threshold = 2;<br> }<br>}<br><br><br>director www round-robin {<br> { .backend = web1; }<br> { .backend = web2; }<br> }<br><br># access control list for "purge": open to only localhost and other local nodes<br>acl purge {<br> "127.0.0.1";<br>}<br><br>sub vcl_recv {<br><br><br> set req.http.host = regsub(req.http.host, "^www\.wiki\.example\.com$","<a href="http://wiki.example.com">wiki.example.com</a>");<br><br> # Serve objects up to 2 minutes past their expiry if the backend<br> # is slow to respond.<br> set req.grace = 120s;<br><br> if (! req.http.Authorization ~ "Basic myBase64Hash==")<br> {<br> error 401 "Restricted";<br> }<br><br> if (req.url ~ "&action=submit($|/)") {<br> return (pass);<br> }<br><br> if (req.restarts == 0) {<br> if (req.http.x-forwarded-for) {<br> set req.http.X-Forwarded-For = req.http.X-Forwarded-For + ", " + client.ip;<br> } else {<br> set req.http.X-Forwarded-For = client.ip;<br> }<br> }<br><br> set req.backend = www;<br><br> # This uses the ACL action called "purge". Basically if a request to<br> # PURGE the cache comes from anywhere other than localhost, ignore it.<br> if (req.request == "PURGE")<br> {if (!client.ip ~ purge)<br> {error 405 "Not allowed.";}<br> return(lookup);}<br><br> if (req.request != "GET" && req.request != "HEAD" &&<br> req.request != "PUT" && req.request != "POST" &&<br> req.request != "TRACE" && req.request != "OPTIONS" &&<br> req.request != "DELETE")<br> {return(pipe);} /* Non-RFC2616 or CONNECT which is weird. */<br><br><br> # Pass anything other than GET and HEAD directly.<br> if (req.request != "GET" && req.request != "HEAD")<br> {return(pass);} /* We only deal with GET and HEAD by default */<br><br> # Pass requests from logged-in users directly.<br> if (req.http.Authorization || req.http.Cookie)<br> {return(pass);} /* Not cacheable by default */<br><br> # Pass any requests with the "If-None-Match" header directly.<br> if (req.http.If-None-Match)<br> {return(pass);}<br><br> # normalize Accept-Encoding to reduce vary<br> if (req.http.Accept-Encoding) {<br> if (req.http.User-Agent ~ "MSIE 6") {<br> unset req.http.Accept-Encoding;<br> } elsif (req.http.Accept-Encoding ~ "gzip") {<br> set req.http.Accept-Encoding = "gzip";<br> } elsif (req.http.Accept-Encoding ~ "deflate") {<br> set req.http.Accept-Encoding = "deflate";<br> } else {<br> unset req.http.Accept-Encoding;<br> }<br> }<br><br> return (lookup);<br>}<br><br>sub vcl_pipe {<br> # Note that only the first request to the backend will have<br> # X-Forwarded-For set. If you use X-Forwarded-For and want to<br> # have it set for all requests, make sure to have:<br> # set req.http.connection = "close";<br><br> # This is otherwise not necessary if you do not do any request rewriting.<br> set req.http.connection = "close";<br>}<br><br># Called if the cache has a copy of the page.<br>sub vcl_hit {<br> if (req.request == "PURGE")<br> {ban_url(req.url);<br> error 200 "Purged";}<br><br> if (!obj.ttl > 0s)<br> {return(pass);}<br>}<br><br><br># Called if the cache does not have a copy of the page.<br>sub vcl_miss {<br> if (req.request == "PURGE")<br> {error 200 "Not in cache";}<br>}<br><br># Called after a document has been successfully retrieved from the backend.<br>sub vcl_fetch {<br> # set minimum timeouts to auto-discard stored objects<br> # set beresp.prefetch = -30s;<br> set beresp.grace = 120s;<br><br> if (beresp.ttl < 48h) {<br> set beresp.ttl = 48h;}<br><br> if (!beresp.ttl > 0s)<br> {return(hit_for_pass);}<br><br> if (beresp.http.Set-Cookie)<br> {return(hit_for_pass);}<br> #if (beresp.http.Cache-Control ~ "(private|no-cache|no-store)")<br> # {return(hit_for_pass);}<br> if (req.http.Authorization && !beresp.http.Cache-Control ~ "public")<br> {return(hit_for_pass);}<br><br>}<br><br>sub vcl_error {<br><br> if (obj.status == 401) {<br> set obj.http.Content-Type = "text/html; charset=utf-8";<br> set obj.http.WWW-Authenticate = "Basic realm=Secured";<br> synthetic {"<br><br> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "<a href="http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd</a>"><br><br> <HTML><br> <HEAD><br> <TITLE>Error</TITLE><br> <META HTTP-EQUIV='Content-Type' CONTENT='text/html;'><br> </HEAD><br> <BODY><H1>401 Unauthorized (varnish)</H1></BODY><br> </HTML><br> "};<br> return (deliver);<br> }<br>}<br><br>sub vcl_deliver {<br> if (obj.hits> 0) {<br> set resp.http.X-Cache = "HIT";<br> } else {<br> set resp.http.X-Cache = "MISS";<br> }<br> }<br><br></div><div>Now, all that's left to do is to set those completely insane timeouts I've been using to try and troubleshoot the problem to something a little more reasonable. <br><br></div><div>Thanks for all the help!<br><br></div><div>Tim<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jul 9, 2015 at 9:01 AM, Jason Price <span dir="ltr"><<a href="mailto:japrice@gmail.com" target="_blank">japrice@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">You're never specifying any auth in your probe:<br>
<span class=""><br>
.probe = {<br>
.request =<br>
"GET /healthcheck.php HTTP/1.1"<br>
"Host: <a href="http://wiki.example.com" rel="noreferrer" target="_blank">wiki.example.com</a>"<br>
"Connection: close";<br>
<br>
</span>I don't know the proper way to specify it, but you'll need to play<br>
around with curl, wireshark and varnish probes until you get it right.<br>
<br>
May be easier to test with telnet invocations:<br>
<br>
telnet 10.10.10.26 80<br>
<span class="">GET /healthcheck.php HTTP/1.1<br>
Host: <a href="http://wiki.example.com" rel="noreferrer" target="_blank">wiki.example.com</a><br>
</span>Authorization: Basic ???????????????<br>
Connection: close<br>
<br>
<br>
The above should give you an auth failure request. Twiddle with that<br>
until you get a successful authentication request, then translate it<br>
into the probe .request format. The link you provided gives you<br>
everything else you need.<br>
<span class="HOEnZb"><font color="#888888"><br>
-Jason<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
On Wed, Jul 8, 2015 at 11:19 PM, Tim Dunphy <<a href="mailto:bluethundr@gmail.com">bluethundr@gmail.com</a>> wrote:<br>
>> that interval and window on your web server is scary..... what you're<br>
>> saying is 'check each web server every 10 minutes, and only fail it<br>
>> after 3 failures'<br>
><br>
><br>
> Hah!! Agreed. I was just trying to rule the connect timeouts out of the<br>
> picture as to why the failures were happening!<br>
> I plan to set them to more normal intervals once I'm finished testing and<br>
> I've been able to get this to work.<br>
><br>
>><br>
>><br>
>> next time you see the issue, look at:<br>
>> varnishadm -n <varnish_name> debug.health<br>
><br>
><br>
> Hmm you may have a point as to the back ends. Varnish is indeed seeing them<br>
> as 'sick' when I encounter the 503 error:<br>
><br>
><br>
> [root@varnish1:~] #varnishadm -n varnish1 debug.health<br>
> Backend web1 is Sick<br>
> Current states good: 0 threshold: 2 window: 3<br>
> Average responsetime of good probes: 0.000000<br>
> Oldest Newest<br>
> ================================================================<br>
> ------------------------------------------------------4444444444 Good IPv4<br>
> ------------------------------------------------------XXXXXXXXXX Good Xmit<br>
> ------------------------------------------------------RRRRRRRRRR Good Recv<br>
> ----------------------------------------------------HH---------- Happy<br>
> Backend web2 is Sick<br>
> Current states good: 0 threshold: 2 window: 3<br>
> Average responsetime of good probes: 0.000000<br>
> Oldest Newest<br>
> ================================================================<br>
> ------------------------------------------------------4444444444 Good IPv4<br>
> ------------------------------------------------------XXXXXXXXXX Good Xmit<br>
> ------------------------------------------------------RRRRRRRRRR Good Recv<br>
> ----------------------------------------------------HH---------- Happy<br>
><br>
>><br>
>><br>
>> I'd be willing to bet that varnish is just failing the backends. Try<br>
>> running the healthcheck manually from the varnish boxes:<br>
>> curl -H "Host:<a href="http://kiki.example.com" rel="noreferrer" target="_blank">kiki.example.com</a>" -v "<a href="http://10.10.10.26/healthcheck.php" rel="noreferrer" target="_blank">http://10.10.10.26/healthcheck.php</a>"<br>
>> And see if you're actually getting good healthchecks. If you're not,<br>
>> then you need to look at your backends (specifically healthcheck.php)<br>
><br>
><br>
> But if I perform the curl you're suggesting, I am able to retrieve the<br>
> healthcheck.php file!!<br>
><br>
> #curl --user admin:somepass -H "Host:<a href="http://wiki.example.com" rel="noreferrer" target="_blank">wiki.example.com</a>" -v<br>
> "<a href="http://10.10.10.25/healthcheck.php" rel="noreferrer" target="_blank">http://10.10.10.25/healthcheck.php</a>"<br>
> * About to connect() to 52.5.117.61 port 80 (#0)<br>
> * Trying 52.5.117.61... connected<br>
> * Connected to 52.5.117.61 (52.5.117.61) port 80 (#0)<br>
> * Server auth using Basic with user 'admin'<br>
>> GET /healthcheck.php HTTP/1.1<br>
>> Authorization: Basic SomeBase64Hash==<br>
>> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7<br>
>> NSS/<a href="http://3.14.0.0" rel="noreferrer" target="_blank">3.14.0.0</a> zlib/1.2.3 libidn/1.18 libssh2/1.4.2<br>
>> Accept: */*<br>
>> Host:<a href="http://wiki.example.com" rel="noreferrer" target="_blank">wiki.example.com</a><br>
>><br>
> < HTTP/1.1 200 OK<br>
> < Date: Thu, 09 Jul 2015 02:10:35 GMT<br>
> < Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9<br>
> PHP/5.4.42 SVN/1.7.14 mod_wsgi/3.4 Python/2.7.5<br>
> < X-Powered-By: PHP/5.4.42<br>
> < Content-Length: 5<br>
> < Content-Type: text/html; charset=UTF-8<br>
> <<br>
> good<br>
> * Connection #0 to host 52.5.117.61 left intact<br>
> * Closing connection #0<br>
><br>
> But in the curl I just did I was specifying the user auth. Which got me to<br>
> thinking, maybe I'm handing apache basic auth in the wrong way in my VCL<br>
> file?<br>
><br>
> To test this idea out, I commented out the basic auth lines in my apache<br>
> config. Then cycled the services on both apache servers and both varnish<br>
> servers.<br>
><br>
> When I ran the test you gave me again, this is the result I got back:<br>
><br>
> #varnishadm -n varnish1 debug.health<br>
> Backend web1 is Healthy<br>
> Current states good: 3 threshold: 2 window: 3<br>
> Average responsetime of good probes: 0.032781<br>
> Oldest Newest<br>
> ================================================================<br>
> ---------------------------------------------------------------4 Good IPv4<br>
> ---------------------------------------------------------------X Good Xmit<br>
> ---------------------------------------------------------------R Good Recv<br>
> -------------------------------------------------------------HHH Happy<br>
> Backend web2 is Healthy<br>
> Current states good: 3 threshold: 2 window: 3<br>
> Average responsetime of good probes: 0.032889<br>
> Oldest Newest<br>
> ================================================================<br>
> ---------------------------------------------------------------4 Good IPv4<br>
> ---------------------------------------------------------------X Good Xmit<br>
> ---------------------------------------------------------------R Good Recv<br>
> -------------------------------------------------------------HHH Happy<br>
><br>
> Everbody's happy again!!<br>
><br>
> And I tried browsing around the wiki for quite a long time. And there were<br>
> NO 503 errors the entire time I was using it. Which tells me that I am,<br>
> indeed, not handling auth correctly in my VCL.<br>
><br>
> The way I thought I solved the problem was by adding a .request to the web<br>
> server definitions that specified the headers to do a GET on the health<br>
> check:<br>
><br>
> .request =<br>
> "GET /healthcheck.php HTTP/1.1"<br>
> "Host: <a href="http://wiki.example.com" rel="noreferrer" target="_blank">wiki.example.com</a>"<br>
> "Connection: close";<br>
><br>
> The reason I thought this worked was because, after I'd restarted varnish<br>
> with that change in place I was able to log into the wiki with basic auth in<br>
> the web browser. And then I'd be able to use it for a while before the<br>
> back-end would come up as 'sick' in varnish again which would cause the 503<br>
> error.<br>
><br>
> I then tried following this advice again, which I had also tried earlier<br>
> without much luck:<br>
><br>
> <a href="http://blog.tenya.me/blog/2011/12/14/varnish-http-authentication/" rel="noreferrer" target="_blank">http://blog.tenya.me/blog/2011/12/14/varnish-http-authentication/</a><br>
><br>
> Which tells you to add this section to your VCL file:<br>
><br>
> if (! req.http.Authorization ~ "Basic SomeBase64Hash==")<br>
> {<br>
> error 401 "Restricted";<br>
> }<br>
><br>
> And then add this sub_vcl section:<br>
><br>
> sub vcl_error {<br>
><br>
> if (obj.status == 401) {<br>
> set obj.http.Content-Type = "text/html; charset=utf-8";<br>
> set obj.http.WWW-Authenticate = "Basic realm=Secured";<br>
> synthetic {"<br>
><br>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"<br>
> "<a href="http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd" rel="noreferrer" target="_blank">http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd</a>"><br>
><br>
> <HTML><br>
> <HEAD><br>
> <TITLE>Error</TITLE><br>
> <META HTTP-EQUIV='Content-Type' CONTENT='text/html;'><br>
> </HEAD><br>
> <BODY><H1>401 Unauthorized (varnish)</H1></BODY><br>
> </HTML><br>
> "};<br>
> return (deliver);<br>
> }<br>
> }<br>
><br>
> And after restarting varnish again on both nodes, with authentication in<br>
> place in the VHOST configs on the web servers I was able to log into the<br>
> wiki site again and browse around for a while.<br>
><br>
> But then after some browsing around the back ends would go sick again and<br>
> you would see the 503:<br>
><br>
> #varnishadm -n varnish1 debug.health<br>
> Backend web1 is Sick<br>
> Current states good: 1 threshold: 2 window: 3<br>
> Average responsetime of good probes: 0.000000<br>
> Oldest Newest<br>
> ================================================================<br>
> --------------------------------------------------------------44 Good IPv4<br>
> --------------------------------------------------------------XX Good Xmit<br>
> --------------------------------------------------------------RR Good Recv<br>
> ------------------------------------------------------------HH-- Happy<br>
> Backend web2 is Sick<br>
> Current states good: 1 threshold: 2 window: 3<br>
> Average responsetime of good probes: 0.000000<br>
> Oldest Newest<br>
> ================================================================<br>
> --------------------------------------------------------------44 Good IPv4<br>
> --------------------------------------------------------------XX Good Xmit<br>
> --------------------------------------------------------------RR Good Recv<br>
> ------------------------------------------------------------HH-- Happy<br>
><br>
> So SOMETHING must still be off with how I'm handling authentication in my<br>
> VCL config. The next step I'm thinking of trying involves passing the<br>
> authentication headers to the .request section of my web server definition.<br>
> Although I'm not sure if it'll work. I'll let you guys know if it does.<br>
><br>
> But I'd like to present the current state of my VLC again in case anyone has<br>
> any insight or knowledge to share that may help.<br>
><br>
> backend web1 {<br>
><br>
> .host = "10.10.10.25";<br>
><br>
> .port = "80";<br>
><br>
> .connect_timeout = 3600s;<br>
><br>
> .first_byte_timeout = 3600s;<br>
><br>
> .between_bytes_timeout = 3600s;<br>
><br>
> .max_connections = 70;<br>
><br>
> .probe = {<br>
><br>
> .request =<br>
><br>
> "GET /healthcheck.php HTTP/1.1"<br>
><br>
> "Host: <a href="http://wiki.example.com" rel="noreferrer" target="_blank">wiki.example.com</a>"<br>
><br>
> "Connection: close";<br>
><br>
> .interval = 10m;<br>
><br>
> .timeout = 60s;<br>
><br>
> .window = 3;<br>
><br>
> .threshold = 2;<br>
><br>
> }<br>
><br>
> }<br>
><br>
> backend web2 {<br>
><br>
> .host = "10.10.10.26";<br>
><br>
> .port = "80";<br>
><br>
> .connect_timeout = 3600s;<br>
><br>
> .first_byte_timeout = 3600s;<br>
><br>
> .between_bytes_timeout = 3600s;<br>
><br>
> .max_connections = 70;<br>
><br>
> .probe = {<br>
><br>
> .request =<br>
><br>
> "GET /healthcheck.php HTTP/1.1"<br>
><br>
> "Host: <a href="http://wiki.example.com" rel="noreferrer" target="_blank">wiki.example.com</a>"<br>
><br>
> "Connection: close";<br>
><br>
> .interval = 10m;<br>
><br>
> .timeout = 60s;<br>
><br>
> .window = 3;<br>
><br>
> .threshold = 2;<br>
><br>
> }<br>
><br>
> }<br>
><br>
> director www round-robin {<br>
><br>
> { .backend = web1; }<br>
><br>
> { .backend = web2; }<br>
><br>
> }<br>
><br>
> sub vcl_recv {<br>
><br>
> if (! req.http.Authorization ~ "Basic Base64Hash==")<br>
><br>
> {<br>
><br>
> error 401 "Restricted";<br>
><br>
> }<br>
><br>
> if (req.url ~ "&action=submit($|/)") {<br>
><br>
> return (pass);<br>
><br>
> }<br>
><br>
> set req.backend = www;<br>
><br>
> return (lookup);<br>
><br>
> }<br>
><br>
> sub vcl_fetch {<br>
><br>
> set beresp.ttl = 3600s;<br>
><br>
> set beresp.grace = 4h;<br>
><br>
> return (deliver);<br>
><br>
> }<br>
><br>
> sub vcl_error {<br>
><br>
> if (obj.status == 401) {<br>
><br>
> set obj.http.Content-Type = "text/html; charset=utf-8";<br>
><br>
> set obj.http.WWW-Authenticate = "Basic realm=Secured";<br>
><br>
> synthetic {"<br>
><br>
><br>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"<br>
> "<a href="http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd" rel="noreferrer" target="_blank">http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd</a>"><br>
><br>
><br>
> <HTML><br>
><br>
> <HEAD><br>
><br>
> <TITLE>Error</TITLE><br>
><br>
> <META HTTP-EQUIV='Content-Type' CONTENT='text/html;'><br>
><br>
> </HEAD><br>
><br>
> <BODY><H1>401 Unauthorized (varnish)</H1></BODY><br>
><br>
> </HTML><br>
><br>
> "};<br>
><br>
> return (deliver);<br>
><br>
> }<br>
><br>
> }<br>
><br>
> sub vcl_deliver {<br>
><br>
> if (obj.hits> 0) {<br>
><br>
> set resp.http.X-Cache = "HIT";<br>
><br>
> } else {<br>
><br>
> set resp.http.X-Cache = "MISS";<br>
><br>
> }<br>
><br>
> }<br>
><br>
> Once again I genuinely appreciate the help of this list, and hope I haven't<br>
> worn out my welcome! ;)<br>
><br>
> Thanks,<br>
> Tim<br>
><br>
><br>
> On Wed, Jul 8, 2015 at 9:31 PM, Jason Price <<a href="mailto:japrice@gmail.com">japrice@gmail.com</a>> wrote:<br>
>><br>
>> that interval and window on your web server is scary..... what you're<br>
>> saying is 'check each web server every 10 minutes, and only fail it<br>
>> after 3 failures'<br>
>><br>
>> next time you see the issue, look at:<br>
>><br>
>> varnishadm -n <varnish_name> debug.health<br>
>><br>
>> I'd be willing to bet that varnish is just failing the backends. Try<br>
>> running the healthcheck manually from the varnish boxes:<br>
>><br>
>> curl -H "Host:<a href="http://kiki.example.com" rel="noreferrer" target="_blank">kiki.example.com</a>" -v "<a href="http://10.10.10.26/healthcheck.php" rel="noreferrer" target="_blank">http://10.10.10.26/healthcheck.php</a>"<br>
>><br>
>> And see if you're actually getting good healthchecks. If you're not,<br>
>> then you need to look at your backends (specifically healthcheck.php)<br>
>><br>
>> On Wed, Jul 8, 2015 at 12:14 PM, Tim Dunphy <<a href="mailto:bluethundr@gmail.com">bluethundr@gmail.com</a>> wrote:<br>
>> > Hi guys,<br>
>> ><br>
>> ><br>
>> > I'm having an issue where my varnish server will stop working after a<br>
>> > while<br>
>> > of browsing around the site I'm using it with and throw a 503 server<br>
>> > unavailable error.<br>
>> ><br>
>> > In my varnish logs I'm getting a 'no backend connection error':<br>
>> ><br>
>> > 10 FetchError c no backend connection<br>
>> > 10 VCL_call c error deliver<br>
>> > 10 VCL_call c deliver deliver<br>
>> > 10 TxProtocol c HTTP/1.1<br>
>> > 10 TxStatus c 503<br>
>> > 10 TxResponse c Service Unavailable<br>
>> > 10 TxHeader c Server: Varnish<br>
>> ><br>
>> ><br>
>> > And if I do a GET on the healthcheck from the command line on the<br>
>> > varnish<br>
>> > server, I get a 503 response from varnish:<br>
>> ><br>
>> > #GET <a href="http://wiki.example.com/healthcheck.php" rel="noreferrer" target="_blank">http://wiki.example.com/healthcheck.php</a><br>
>> ><br>
>> > <?xml version="1.0" encoding="utf-8"?><br>
>> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"<br>
>> > "<a href="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" rel="noreferrer" target="_blank">http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</a>"><br>
>> > <html><br>
>> > <head><br>
>> > <title>503 Service Unavailable</title><br>
>> > </head><br>
>> > <body><br>
>> > <h1>Error 503 Service Unavailable</h1><br>
>> > <p>Service Unavailable</p><br>
>> > <h3>Guru Meditation:</h3><br>
>> > <p>XID: <a href="tel:2107225059" value="+12107225059">2107225059</a></p><br>
>> > <hr><br>
>> > <p>Varnish cache server</p><br>
>> > </body><br>
>> > </html><br>
>> ><br>
>> > But if I do another GET on the healthcheck file from the varnish server<br>
>> > to<br>
>> > another apache VHOST on the same server as the wiki site that responds<br>
>> > to<br>
>> > the IP of the web server instead of the IP for the varnish server, the<br>
>> > GET<br>
>> > works:<br>
>> ><br>
>> > #GET <a href="http://ops1.example.com/healthcheck.php" rel="noreferrer" target="_blank">http://ops1.example.com/healthcheck.php</a><br>
>> > good<br>
>> ><br>
>> ><br>
>> > So I'm not sure why varnish is having trouble reaching the HC file. The<br>
>> > web<br>
>> > server is a little far from the varnish server. The varnish machines are<br>
>> > in<br>
>> > NYC and the web servers are in northern Virginia.<br>
>> ><br>
>> > So I tried setting the timeouts in the varnish config to a really high<br>
>> > number. And that was working for a while. But today I noticed that it<br>
>> > stopped working. I'll have to restart the varnish service and browse the<br>
>> > site for a while. Then it'll stop working again and produce the 503<br>
>> > error.<br>
>> > It's pretty annoying!<br>
>> ><br>
>> > I was wondering if there might be something in my VCL I could tweak to<br>
>> > make<br>
>> > this work? Or if the fact is that the web servers are simply too far<br>
>> > from<br>
>> > varnish for this to be practical.<br>
>> ><br>
>> > Here's my VCL file. It's pretty basic:<br>
>> ><br>
>> > backend web1 {<br>
>> > .host = "10.10.10.25";<br>
>> > .port = "80";<br>
>> > .connect_timeout = 1200s;<br>
>> > .first_byte_timeout = 1200s;<br>
>> > .between_bytes_timeout = 1200s;<br>
>> > .max_connections = 70;<br>
>> > .probe = {<br>
>> > .request =<br>
>> > "GET /healthcheck.php HTTP/1.1"<br>
>> > "Host: <a href="http://wiki.example.com" rel="noreferrer" target="_blank">wiki.example.com</a>"<br>
>> > "Connection: close";<br>
>> > .interval = 10m;<br>
>> > .timeout = 60s;<br>
>> > .window = 3;<br>
>> > .threshold = 2;<br>
>> > }<br>
>> > }<br>
>> ><br>
>> > backend web2 {<br>
>> > .host = "10.10.10.26";<br>
>> > .port = "80";<br>
>> > .connect_timeout = 1200s;<br>
>> > .first_byte_timeout = 1200s;<br>
>> > .between_bytes_timeout = 1200s;<br>
>> > .max_connections = 70;<br>
>> > .probe = {<br>
>> > .request =<br>
>> > "GET /healthcheck.php HTTP/1.1"<br>
>> > "Host: <a href="http://wiki.example.com" rel="noreferrer" target="_blank">wiki.example.com</a>"<br>
>> > "Connection: close";<br>
>> > .interval = 10m;<br>
>> > .timeout = 60s;<br>
>> > .window = 3;<br>
>> > .threshold = 2;<br>
>> > }<br>
>> > }<br>
>> ><br>
>> > director www round-robin {<br>
>> > { .backend = web1; }<br>
>> > { .backend = web2; }<br>
>> > }<br>
>> ><br>
>> > sub vcl_recv {<br>
>> ><br>
>> > if (req.url ~ "&action=submit($|/)") {<br>
>> > return (pass);<br>
>> > }<br>
>> ><br>
>> > set req.backend = www;<br>
>> > return (lookup);<br>
>> > }<br>
>> ><br>
>> > sub vcl_fetch {<br>
>> > set beresp.ttl = 3600s;<br>
>> > set beresp.grace = 4h;<br>
>> > return (deliver);<br>
>> > }<br>
>> ><br>
>> ><br>
>> > sub vcl_deliver {<br>
>> > if (obj.hits> 0) {<br>
>> > set resp.http.X-Cache = "HIT";<br>
>> > } else {<br>
>> > set resp.http.X-Cache = "MISS";<br>
>> > }<br>
>> > }<br>
>> ><br>
>> > Thanks,<br>
>> > Tim<br>
>> ><br>
>> ><br>
>> ><br>
>> > --<br>
>> > GPG me!!<br>
>> ><br>
>> > gpg --keyserver <a href="http://pool.sks-keyservers.net" rel="noreferrer" target="_blank">pool.sks-keyservers.net</a> --recv-keys F186197B<br>
>> ><br>
>> ><br>
>> > _______________________________________________<br>
>> > varnish-misc mailing list<br>
>> > <a href="mailto:varnish-misc@varnish-cache.org">varnish-misc@varnish-cache.org</a><br>
>> > <a href="https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc" rel="noreferrer" target="_blank">https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc</a><br>
><br>
><br>
><br>
><br>
> --<br>
> GPG me!!<br>
><br>
> gpg --keyserver <a href="http://pool.sks-keyservers.net" rel="noreferrer" target="_blank">pool.sks-keyservers.net</a> --recv-keys F186197B<br>
><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature">GPG me!!<br><br>gpg --keyserver <a href="http://pool.sks-keyservers.net" target="_blank">pool.sks-keyservers.net</a> --recv-keys F186197B<br><br></div>
</div>