site goes down if same one of two varnish nodes stopped

Jason Heffner jdh132 at psu.edu
Mon May 12 02:08:36 CEST 2014


It sounds like you are using a tcp monitor on your F5 from reading over at first glance. If varnish goes down, but the system stays up, your monitor wouldn’t remove the node from the pool and keeps sending connections to that node. You want to use a custom monitor like in the attached image in combination with this in your vcl. You can test this by stopping one of your varnish nodes and seeing if it is marked down in the pool.

  // add ping url to test Varnish status
  if (req.request == "GET" && req.url ~ "/varnish-ping") {
  error 200 "OK";
  }



Jason

p: (814) 865-1840, c: (814) 777-7665
Systems Administrator
Teaching and Learning with Technology, Information Technology Services
The Pennsylvania State University

On May 11, 2014, at 7:15 PM, Tim Dunphy <bluethundr at gmail.com> wrote:

> Hey guys,
> 
>  One more interesting thing about my situation. Is that if I do a varnishstat command on both node A (which seems to control the site) and node B (which does not seem to), I get further evidence that both nodes are supporting the site. 
> 
> 
> 
> 0+02:10:12                                                       uszmpwsls014la
> 
> Hitrate ratio:        4        4        4
> 
> Hitrate avg:     0.9977   0.9977   0.9977
> 
> 
> 
>         3139         1.00         0.40 Client connections accepted
> 
>         3149         1.00         0.40 Client requests received
> 
>         3120         1.00         0.40 Cache hits
> 
>           29         0.00         0.00 Cache misses
> 
>           25         0.00         0.00 Backend conn. success
> 
>            4         0.00         0.00 Backend conn. reuses
> 
>           20         0.00         0.00 Backend conn. was closed
> 
>           26         0.00         0.00 Backend conn. recycles
> 
>           29         0.00         0.00 Fetch with Length
> 
>           16          .            .   N struct sess_mem
> 
>           26          .            .   N struct object
> 
>           36          .            .   N struct objectcore
> 
>           25          .            .   N struct objecthead
> 
>            2          .            .   N struct vbe_conn
> 
>          500          .            .   N worker threads
> 
>          500         0.00         0.06 N worker threads created
> 
>            3          .            .   N backends
> 
>         1563          .            .   N LRU moved objects
> 
>         3128         1.00         0.40 Objects sent with write
> 
>         3139         1.00         0.40 Total Sessions
> 
> 
> 
> 0+03:04:56                                                       uszmpwsls014lb
> 
> Hitrate ratio:       10       21       21
> 
> Hitrate avg:     0.9999   0.9998   0.9998
> 
> 
> 
>         4440         2.00         0.40 Client connections accepted
> 
>         4440         2.00         0.40 Client requests received
> 
>         4421         2.00         0.40 Cache hits
> 
>           19         0.00         0.00 Cache misses
> 
>           19         0.00         0.00 Backend conn. success
> 
>           16         0.00         0.00 Backend conn. was closed
> 
>           19         0.00         0.00 Backend conn. recycles
> 
>           19         0.00         0.00 Fetch with Length
> 
>           10          .            .   N struct sess_mem
> 
>           19          .            .   N struct object
> 
>           29          .            .   N struct objectcore
> 
>           11          .            .   N struct objecthead
> 
>            3          .            .   N struct vbe_conn
> 
>          500          .            .   N worker threads
> 
>          500         0.00         0.05 N worker threads created
> 
>            3          .            .   N backends
> 
>         2209          .            .   N LRU moved objects
> 
>         4440         2.00         0.40 Objects sent with write
> 
>         4440         2.00         0.40 Total Sessions
> 
>         4440         2.00         0.40 Total Requests
> 
> So why the entire site goes down when I bring down node A but leave up node B I am still a little puzzled by. Unless the explanation may be that the F5 is NOT balancing the two varnish nodes in quite the way I appear to think. But if that is the case, then why do we see almost identical stats coming out of both hosts?
> 
> 
> 
> Thanks
> 
> Tim
> 
> 
> 
> On Sun, May 11, 2014 at 6:20 PM, Tim Dunphy <bluethundr at gmail.com> wrote:
> hey all.. 
> 
> I have two varnish nodes being balanced by an F5 load balancer both were installed in the same exact manner with yum installing local rpms of varnish 2.1.5 (the requested version of the client).
> Both share the exact same default.vcl file.  But if you take node a down with node b running the whole site goes down if you take node b down with node a running the site stays up. I need to determine why node b isn't supporting the site. Each varnish node needs to be balancing 3 web servers and it looks like the a node does. Since the site goes down when you take down node a and leave node b running 
> I had a look at varnishlog for both and both nodes appear to be getting hit.
> 
> Node A:
> 
> 3 VCL_return   c deliver
> 
>     3 TxProtocol   c HTTP/1.1
> 
>     3 TxStatus     c 200
> 
>     3 TxResponse   c OK
> 
>     3 TxHeader     c Server: Apache
> 
>     3 TxHeader     c X-Powered-By: PHP/5.2.8
> 
>     3 TxHeader     c Content-Type: text/html
> 
>     3 TxHeader     c Cache-Control: max-age = 600
> 
>     3 TxHeader     c Content-Length: 4
> 
>     3 TxHeader     c Date: Sun, 11 May 2014 22:11:02 GMT
> 
>     3 TxHeader     c X-Varnish: 1578371599 1578371564
> 
>     3 TxHeader     c Age: 86
> 
>     3 TxHeader     c Via: 1.1 varnish
> 
>     3 TxHeader     c Connection: close
> 
>     3 TxHeader     c Varnish-X-Cache: HIT
> 
>     3 TxHeader     c Varnish-X-Cache-Hits: 35
> 
>     3 Length       c 4
> 
>     3 ReqEnd       c 1578371599 1399846262.156239033 1399846262.156332970 0.000054121 0.000056028 0.000037909
> 
> 
> 
> Node B:
> 
> 9 VCL_return   c deliver
> 
>     9 TxProtocol   c HTTP/1.1
> 
>     9 TxStatus     c 200
> 
>     9 TxResponse   c OK
> 
>     9 TxHeader     c Server: Apache
> 
>     9 TxHeader     c X-Powered-By: PHP/5.2.17
> 
>     9 TxHeader     c Content-Type: text/html
> 
>     9 TxHeader     c Cache-Control: max-age = 600
> 
>     9 TxHeader     c Content-Length: 4
> 
>     9 TxHeader     c Date: Sun, 11 May 2014 22:11:33 GMT
> 
>     9 TxHeader     c X-Varnish: 1525629213 1525629076
> 
>     9 TxHeader     c Age: 341
> 
>     9 TxHeader     c Via: 1.1 varnish
> 
>     9 TxHeader     c Connection: close
> 
>     9 TxHeader     c Varnish-X-Cache: HIT
> 
>     9 TxHeader     c Varnish-X-Cache-Hits: 137
> 
>     9 Length       c 4
> 
>     9 ReqEnd       c 1525629213 1399846293.098695993 1399846293.098922968 0.000057936 0.000181913 0.000045061
> 
> So I'm not sure why this is the case.
> 
> Here’s the VCL file that I’m using in case this might shed any clues. I apologize that I’m still to much of a newb to ferret out the most relevant parts. But I hope that the context may yield some clues. 
> backend web1 {
> 
>     .host = "10.10.1.104";
> 
>     .port = "80";
> 
>     .connect_timeout = 45s;
> 
>     .first_byte_timeout = 45s;
> 
>     .between_bytes_timeout = 45s;
> 
>     .max_connections = 70;
> 
>     .probe = {
> 
>         .url = "/healthcheck.php";
> 
>         .timeout = 5s;
> 
>         .interval = 30s;
> 
>         .window = 10;
> 
>         .threshold = 1;
> 
>     }
> 
> }
> 
> backend web2 {
> 
>     .host = "10.10.1.105";
> 
>     .port = "80";
> 
>     .connect_timeout = 45s;
> 
>     .first_byte_timeout = 45s;
> 
>     .between_bytes_timeout = 45s;
> 
>     .max_connections = 70;
> 
>     .probe = {
> 
>         .url = "/healthcheck.php";
> 
>         .timeout = 5s;
> 
>         .interval = 30s;
> 
>         .window = 10;
> 
>         .threshold = 1;
> 
>     }
> 
> }
> 
> backend web3 {
> 
>     .host = "10.10.1.106";
> 
>     .port = "80";
> 
>     .connect_timeout = 45s;
> 
>     .first_byte_timeout = 45s;
> 
>     .between_bytes_timeout = 45s;
> 
>     .max_connections = 70;
> 
>     .probe = {
> 
>         .url = "/healthcheck.php";
> 
>         .timeout = 5s;
> 
>         .interval = 30s;
> 
>         .window = 10;
> 
>         .threshold = 1;
> 
>     }
> 
> }
> 
> acl purge {
> 
>     "localhost";
> 
>     "127.0.0.1";
> 
>     "10.10.1.102";
> 
>     "10.10.1.103";
> 
> }
> 
> director www round-robin {
> 
>     { .backend = web1; }
> 
>     { .backend = web2; }
> 
>     { .backend = web3; }
> 
> 
> 
> }
> 
> sub vcl_recv {
> 
>     set req.backend = www;
> 
>     set req.grace = 6h;
> 
>     if (!req.backend.healthy) {
> 
>         set req.grace = 24h;
> 
>     }
> 
>     set req.http.X-Forwarded-For = req.http.X-Forwarded-For ", " client.ip;
> 
>     if (req.http.host ~ "^origin\.test(.+\.|)mywebsite\.com$") {
> 
>       return (pass);
> 
>     }
> 
>     if (req.http.host ~ ".*\.mywebsite.com|mywebsite.com") {
> 
>         /* allow (origin.)stage.m.mywebsite.com to be a separate host */
> 
>         if (req.http.host != "stage.m.mywebsite.com") {
> 
>             set req.http.host = "stage.mywebsite.com";
> 
>         }
> 
>     } else {
> 
>         return (pass);
> 
>     }
> 
>     if (req.request == "PURGE") {
> 
>         if (!client.ip ~ purge) {
> 
>             error 405 "Not allowed.";
> 
>         }
> 
>         return (lookup);
> 
>     }
> 
>     if (req.request != "GET" &&
> 
>         req.request != "HEAD" &&
> 
>         req.request != "PUT" &&
> 
>         req.request != "POST" &&
> 
>         req.request != "TRACE" &&
> 
>         req.request != "OPTIONS" &&
> 
>         req.request != "DELETE") {
> 
>             return (pipe);
> 
>     }
> 
>     if (req.request != "GET" && req.request != "HEAD") {
> 
>         return (pass);
> 
>     }
> 
>     if (req.url ~ "sites/all/modules/custom/bravo_ad/ads.html\?.*") {
> 
>       set req.url = "/sites/all/modules/custom/bravo_ad/ads.html";
> 
>     }
> 
>     if (req.url ~ "eyeblaster/addineyeV2.html\?.*") {
> 
>         set req.url = "/eyeblaster/addineyeV2.html";
> 
>     }
> 
>     if (req.url ~ "ahah_helper\.php|bravo_points\.php|install\.php|update\.php|cron\.php|/json(:?\?.*)?$") {
> 
>         return (pass);
> 
>     }
> 
>     if (req.http.Authorization) {
> 
>         return (pass);
> 
>     }
> 
>     if (req.url ~ "login" || req.url ~ "logout") {
> 
>         return (pass);
> 
>     }
> 
>     if (req.url ~ "^/admin/" || req.url ~ "^/node/add/") {
> 
>         return (pass);
> 
>     }
> 
>     if (req.http.Cache-Control ~ "no-cache") {
> 
>         // return (pass);
> 
>     }
> 
>     if (req.http.Cookie ~ "(VARNISH|DRUPAL_UID|LOGGED_IN|SESS|_twitter_sess)") {
> 
>         set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");
> 
>         set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
> 
>     } else {
> 
>         unset req.http.Cookie;
> 
>     }
> 
>     /* removed varnish cache backend logic */
> 
>     if (req.restarts == 0) {
> 
>         set req.backend = www;
> 
>     } elsif (req.restarts >= 2) {
> 
>         return (pass);
> 
>     }
> 
>     if (req.restarts >= 2) {
> 
>         return (pass);
> 
>     }
> 
>     if (req.url ~ "\.(ico|jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|ICO|JPG|JPEG|PNG|GIF|GZ|TGZ|BZ2|TBZ|MP3|OOG|SWF)") {
> 
>         unset req.http.Accept-Encoding;
> 
>     }
> 
>     if (req.url ~ "^/(sites/all/modules/mywebsite_admanager/includes/ads.php|doubleclick/DARTIframe.html)(\?.*|)$") {
> 
>         set req.url = regsub(req.url, "\?.*$", "");
> 
>     }
> 
>     if (req.http.Accept-Encoding ~ "gzip") {
> 
>         set req.http.Accept-Encoding = "gzip";
> 
>     } elsif (req.http.Accept-Encoding ~ "deflate") {
> 
>         set req.http.Accept-Encoding = "deflate";
> 
>     } else {
> 
>         unset req.http.Accept-Encoding;
> 
>     }
> 
>     return (lookup);
> 
> }
> 
> sub vcl_pipe {
> 
>     set bereq.http.connection = "close";
> 
>     return (pipe);
> 
> }
> 
> sub vcl_pass {
> 
>     return (pass);
> 
> }
> 
> sub vcl_hash {
> 
>     set req.hash += req.url;
> 
>     set req.hash += req.http.host;
> 
>     if (req.http.Cookie ~ "VARNISH|DRUPAL_UID|LOGGED_IN") {
> 
>         set req.hash += req.http.Cookie;
> 
>     }
> 
>     return (hash);
> 
> }
> 
> sub vcl_hit {
> 
>     if (req.request == "PURGE") {
> 
>         set obj.ttl = 0s;
> 
>         error 200 "Purged.";
> 
>     }
> 
> }
> 
> sub vcl_fetch {
> 
>     if (beresp.status == 500) {
> 
>         set req.http.X-Varnish-Error = "1";
> 
>         restart;
> 
>     }
> 
>     set beresp.grace = 6h;
> 
>     # Set a short circuit cache lifetime for resp codes above 302
> 
>     if (beresp.status > 302) {
> 
>     set beresp.ttl = 60s;
> 
>     set beresp.http.Cache-Control = "max-age = 60";
> 
>     }
> 
>     if (beresp.http.Edge-control ~ "no-store") {
> 
>         set beresp.http.storage = "1";
> 
>         set beresp.cacheable = false;
> 
>         return (pass);
> 
>     }
> 
>     if (beresp.status >= 300 || !beresp.cacheable) {
> 
>         set beresp.http.Varnish-X-Cacheable = "Not Cacheable";
> 
>         set beresp.http.storage = "1";
> 
>         return (pass);
> 
>     }
> 
>     if (beresp.http.Set-Cookie) {
> 
>         return (pass);
> 
>     }
> 
>     if (beresp.cacheable) {
> 
>         unset beresp.http.expires;
> 
>         set beresp.ttl = 600s;
> 
>         set beresp.http.Cache-Control = "max-age = 600";
> 
>         if (req.url ~ "\.(ico|jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|swf|ICO|JPG|JPEG|PNG|GIF|GZ|TGZ|BZ2|TBZ|MP3|OOG|SWF)") {
> 
>             set beresp.ttl = 43829m;
> 
>             set beresp.http.Cache-Control = "max-age = 1000000";
> 
>         }
> 
>     }
> 
>     return (deliver);
> 
> }
> 
> 
> 
> sub vcl_deliver {
> 
>     if (obj.hits > 0) {
> 
>         set resp.http.Varnish-X-Cache = "HIT";
> 
>         set resp.http.Varnish-X-Cache-Hits = obj.hits;
> 
>     } else {
> 
>         set resp.http.Varnish-X-Cache = "MISS";
> 
>     }
> 
>     return (deliver);
> 
> }
> 
> sub vcl_error {
> 
>     if (req.restarts == 0) {
> 
>         return (restart);
> 
>     }
> 
>     if (req.http.X-Varnish-Error != "1") {
> 
>         set req.http.X-Varnish-Error = "1";
> 
>         return (restart);
> 
>     }
> 
> }
> 
>  The only part that I omitted was the one pointing to the error page. Can anyone offer any advice on how to troubleshoot this?
> 
> I'm enclosing the full VCL in case that extra info is helpful. I didn't omit much tho.
> 
> Thank you!
> 
> Tim
> 
> 
> -- 
> GPG me!!
> 
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
> 
> 
> 
> 
> -- 
> GPG me!!
> 
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
> 
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20140511/4b8a212d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: varnish-ping.png
Type: image/png
Size: 128851 bytes
Desc: not available
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20140511/4b8a212d/attachment-0001.png>


More information about the varnish-misc mailing list