varnish using dropped backend connections?

Nathan Warren nwarren at nazarene.org
Tue Jun 18 17:22:10 CEST 2013


Thank you to all who at least put thought to my inquiry. Here is what I was able to discover:

1 ) The ASA firewall kills idle sessions after 1 hour
2) Varnish doesn't set the keepalive flag for backend connections
3) Once the tcp session is dropped by the firewall, varnish doesn't detect this, and keeps trying to use this backend connection until it times out
4) With multiple established (and failed) tcp sessions, the fetch from backend will lag severely waiting for timeouts until it either hits a usable tcp session or creates a new one
4) Health probes aren't able to detect this, since they create and close a tcp session for each probe

I ended up having to put apache in-between varnish and my backend since Apache does have a keepalive option in ProxyPass.

So my question is, should Varnish have an option to set the keepalive flag on backend connections? Should I make a feature request? This seems relatively simple.

Nate

-----Original Message-----
From: varnish-misc-bounces+nwarren=nazarene.org at varnish-cache.org [mailto:varnish-misc-bounces+nwarren=nazarene.org at varnish-cache.org] On Behalf Of Nathan Warren
Sent: Thursday, June 13, 2013 12:56 PM
To: varnish-misc at varnish-cache.org
Subject: varnish using dropped backend connections?

Good day,

I have a varnish cache serving requests to two play framework instances with a cisco asa firewall in-between. Everything works great at first - cache is working, acceptable cache hit ratio, fast, etc. But then after about an hour (without hits to the service), when the cache misses and pulls from the backend, there is no reply - the pound server times out while varnish hangs. Eventually varnish gives a 503 meditation, but pound doesn't wait that long to time out. Packet traces on the varnish box show packets being sent from varnish, but packet traces never show it reaching the backend server. The health checks keep polling successfully throughout.

My suspicion was that varnish was creating a keepalive session to the play server that was persistent. This seems to be the case with tcp sessions visible (via netstat) on both sides when no traffic is passing through. It seems as though the firewall was dropping the keepalive session, but varnish kept using it as if it were there. Then it kept re-using it over and over. After refreshing the page a few times (in error), the page will start working.

I found that cisco asa firewalls drop idle sessions after one hour, and that my version of linux has a 7200 time to wait before interrogating a possibly failed tcp session. I dropped this to 1800 via /proc/sys/net/ipv4/tcp_keepalive_time and restarted varnish. After waiting for an hour, the problem continues as before.

I would appreciate any insight you can provide, even if it's a problem with my config that doesn't seem to be related to this problem.

Thanks,
Nate

####  default.vcl  ####
backend backend11 {
        .host = "10.1.1.20";
        .port = "9010";
        .probe = {
                .url = "/sys-status";
                .interval = 10s;
                .timeout = 1s;
                .window = 5;
                .threshold = 3;
        }
}

backend backend12 {
        .host = "10.1.1.21";
        .port = "9010";
        .probe = {
                .url = "/sys-status";
                .interval = 10s;
                .timeout = 1s;
                .window = 5;
                .threshold = 3;
        }
}

director backend1 random {
        .retries = 5;
        {
                .backend = backend11;
                .weight = 3;
        }
        {
                .backend = backend12;
                .weight = 3;
        }
}

backend backend21 {
        .host = "10.1.1.20";
        .port = "9003";
        .probe = {
                .url = "/ping";
                .interval = 10s;
                .timeout = 2s;
                .window = 5;
                .threshold = 3;
        }
}

backend backend22 {
        .host = "10.1.1.21";
        .port = "9003";
        .probe = {
                .url = "/ping";
                .interval = 10s;
                .timeout = 1s;
                .window = 5;
                .threshold = 3;
        }
}

director backend2 random {
        {
                .backend = backend11;
                .weight = 3;
        }
        {
                .backend = backend12;
                .weight = 3;
        }
}

sub vcl_recv {

        #serve stale data is ok
        set req.grace = 55m;

        #we don't care about any cookies
        unset req.http.Cookie;

        # secure redirects
        if (req.http.host ~ "^server1.nazarene.org$"
                        && req.http.X-Forwarded-Proto !~ "(?i)https" ) {
                set req.http.x-Redir-Url = "https://" + req.http.host + req.url;
                error 750 req.http.x-Redir-Url;
        }

        if (req.url ~ "(noaccess|notallowed)" ) {
                error 404 "Not Found";
        }

        # send requests to server1 to go to backend1
        if (req.http.host ~ "^server1.domain.local$") {
                set req.backend = backend1;

                #if the request is going to /foo/p/blah/blah then we can't cache, so don't try
                if (req.url ~ "^/foo/p/.*") {
                        return (pipe);
                }

                return (lookup);
        }

        # send requests to server2.domain.local
        if ( req.http.host ~ "^server2.domain.local$" ) {
                # but only for certain urls
                if ( req.url ~ "^/foo/(bar|foobar)" ) {
                        set req.backend = backend2;
                        return (lookup);
                }
        }

        # If you're not served at this point, I don't have anything for you
        error 404 "Not Found";
}

sub vcl_fetch {
        set beresp.grace = 60m;

        #strip cookies from backend so varnish will cache
        unset beresp.http.Set-Cookie;

        # Properly handle different encoding types
    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
            # No point in compressing these
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unkown algorithm
            remove req.http.Accept-Encoding;
        }
    }

        if (beresp.http.content-type ~ "text") {
          set beresp.do_gzip = true;
        }
}

sub vcl_error {
        #redirect the client to the specified location
        if (obj.status == 750) {
                set obj.http.Location = obj.response;
                set obj.status = 302;
                return (deliver);
        }
}

#####  END default.vcl  ######

user at host:~$ sudo varnishstat -1 -n varnish
client_conn                 21         0.00 Client connections accepted
client_drop                  0         0.00 Connection dropped, no sess/wrk
client_req                 146         0.02 Client requests received
cache_hit                  111         0.01 Cache hits
cache_hitpass                0         0.00 Cache hits for pass
cache_miss                  35         0.00 Cache misses
backend_conn                10         0.00 Backend conn. success
backend_unhealthy            0         0.00 Backend conn. not attempted
backend_busy                 0         0.00 Backend conn. too many
backend_fail                 0         0.00 Backend conn. failures
backend_reuse               27         0.00 Backend conn. reuses
backend_toolate              0         0.00 Backend conn. was closed
backend_recycle             30         0.00 Backend conn. recycles
backend_retry                5         0.00 Backend conn. retry
fetch_head                   0         0.00 Fetch head
fetch_length                30         0.00 Fetch with Length
fetch_chunked                0         0.00 Fetch chunked
fetch_eof                    0         0.00 Fetch EOF
fetch_bad                    0         0.00 Fetch had bad headers
fetch_close                  0         0.00 Fetch wanted close
fetch_oldhttp                0         0.00 Fetch pre HTTP/1.1 closed
fetch_zero                   0         0.00 Fetch zero len
fetch_failed                 0         0.00 Fetch failed
fetch_1xx                    0         0.00 Fetch no body (1xx)
fetch_204                    0         0.00 Fetch no body (204)
fetch_304                    0         0.00 Fetch no body (304)
n_sess_mem                  14          .   N struct sess_mem
n_sess                       0          .   N struct sess
n_object                    16          .   N struct object
n_vampireobject              0          .   N unresurrected objects
n_objectcore                18          .   N struct objectcore
n_objecthead                21          .   N struct objecthead
n_waitinglist                7          .   N struct waitinglist
n_vbc                        3          .   N struct vbc
n_wrk                      800          .   N worker threads
n_wrk_create               800         0.09 N worker threads created
n_wrk_failed                 0         0.00 N worker threads not created
n_wrk_max                    0         0.00 N worker threads limited
n_wrk_lqueue                 0         0.00 work request queue length
n_wrk_queued                 0         0.00 N queued work requests
n_wrk_drop                   0         0.00 N dropped work requests
n_backend                    4          .   N backends
n_expired                   14          .   N expired objects
n_lru_nuked                  0          .   N LRU nuked objects
n_lru_moved                 39          .   N LRU moved objects
losthdr                      0         0.00 HTTP header overflows
n_objsendfile                0         0.00 Objects sent with sendfile
n_objwrite                  25         0.00 Objects sent with write
n_objoverflow                0         0.00 Objects overflowing workspace
s_sess                      21         0.00 Total Sessions
s_req                      146         0.02 Total Requests
s_pipe                       0         0.00 Total pipe
s_pass                       0         0.00 Total pass
s_fetch                     30         0.00 Total fetch
s_hdrbytes               45932         5.26 Total header bytes
s_bodybytes              66053         7.57 Total body bytes
sess_closed                  6         0.00 Session Closed
sess_pipeline                0         0.00 Session Pipeline
sess_readahead               0         0.00 Session Read Ahead
sess_linger                141         0.02 Session Linger
sess_herd                   90         0.01 Session herd
shm_records              17801         2.04 SHM records
shm_writes               10492         1.20 SHM writes
shm_flushes                  0         0.00 SHM flushes due to overflow
shm_cont                     0         0.00 SHM MTX contention
shm_cycles                   0         0.00 SHM cycles through buffer
sms_nreq                     5         0.00 SMS allocator requests
sms_nobj                     0          .   SMS outstanding allocations
sms_nbytes                   0          .   SMS outstanding bytes
sms_balloc                2090          .   SMS bytes allocated
sms_bfree                 2090          .   SMS bytes freed
backend_req                 37         0.00 Backend requests made
n_vcl                        1         0.00 N vcl total
n_vcl_avail                  1         0.00 N vcl available
n_vcl_discard                0         0.00 N vcl discarded
n_ban                        1          .   N total active bans
n_ban_add                    1         0.00 N new bans added
n_ban_retire                 0         0.00 N old bans deleted
n_ban_obj_test               0         0.00 N objects tested
n_ban_re_test                0         0.00 N regexps tested against
n_ban_dups                   0         0.00 N duplicate bans removed
hcb_nolock                 146         0.02 HCB Lookups without lock
hcb_lock                    18         0.00 HCB Lookups with lock
hcb_insert                  18         0.00 HCB Inserts
esi_errors                   0         0.00 ESI parse errors (unlock)
esi_warnings                 0         0.00 ESI parse warnings (unlock)
accept_fail                  0         0.00 Accept failures
client_drop_late             0         0.00 Connection dropped late
uptime                    8729         1.00 Client uptime
dir_dns_lookups              0         0.00 DNS director lookups
dir_dns_failed               0         0.00 DNS director failed lookups
dir_dns_hit                  0         0.00 DNS director cached lookups hit
dir_dns_cache_full           0         0.00 DNS director full dnscache
vmods                        0          .   Loaded VMODs
n_gzip                       9         0.00 Gzip operations
n_gunzip                     0         0.00 Gunzip operations
LCK.sms.creat                1         0.00 Created locks
LCK.sms.destroy              0         0.00 Destroyed locks
LCK.sms.locks               15         0.00 Lock Operations
LCK.sms.colls                0         0.00 Collisions
LCK.smp.creat                0         0.00 Created locks
LCK.smp.destroy              0         0.00 Destroyed locks
LCK.smp.locks                0         0.00 Lock Operations
LCK.smp.colls                0         0.00 Collisions
LCK.sma.creat                2         0.00 Created locks
LCK.sma.destroy              0         0.00 Destroyed locks
LCK.sma.locks              103         0.01 Lock Operations
LCK.sma.colls                0         0.00 Collisions
LCK.smf.creat                0         0.00 Created locks
LCK.smf.destroy              0         0.00 Destroyed locks
LCK.smf.locks                0         0.00 Lock Operations
LCK.smf.colls                0         0.00 Collisions
LCK.hsl.creat                0         0.00 Created locks
LCK.hsl.destroy              0         0.00 Destroyed locks
LCK.hsl.locks                0         0.00 Lock Operations
LCK.hsl.colls                0         0.00 Collisions
LCK.hcb.creat                1         0.00 Created locks
LCK.hcb.destroy              0         0.00 Destroyed locks
LCK.hcb.locks               71         0.01 Lock Operations
LCK.hcb.colls                0         0.00 Collisions
LCK.hcl.creat                0         0.00 Created locks
LCK.hcl.destroy              0         0.00 Destroyed locks
LCK.hcl.locks                0         0.00 Lock Operations
LCK.hcl.colls                0         0.00 Collisions
LCK.vcl.creat                1         0.00 Created locks
LCK.vcl.destroy              0         0.00 Destroyed locks
LCK.vcl.locks                9         0.00 Lock Operations
LCK.vcl.colls                0         0.00 Collisions
LCK.stat.creat               1         0.00 Created locks
LCK.stat.destroy             0         0.00 Destroyed locks
LCK.stat.locks              14         0.00 Lock Operations
LCK.stat.colls               0         0.00 Collisions
LCK.sessmem.creat            1         0.00 Created locks
LCK.sessmem.destroy            0         0.00 Destroyed locks
LCK.sessmem.locks             37         0.00 Lock Operations
LCK.sessmem.colls              0         0.00 Collisions
LCK.wstat.creat                1         0.00 Created locks
LCK.wstat.destroy              0         0.00 Destroyed locks
LCK.wstat.locks            26237         3.01 Lock Operations
LCK.wstat.colls                0         0.00 Collisions
LCK.herder.creat               1         0.00 Created locks
LCK.herder.destroy             0         0.00 Destroyed locks
LCK.herder.locks               1         0.00 Lock Operations
LCK.herder.colls               0         0.00 Collisions
LCK.wq.creat                   2         0.00 Created locks
LCK.wq.destroy                 0         0.00 Destroyed locks
LCK.wq.locks               18452         2.11 Lock Operations
LCK.wq.colls                   0         0.00 Collisions
LCK.objhdr.creat              25         0.00 Created locks
LCK.objhdr.destroy             4         0.00 Destroyed locks
LCK.objhdr.locks             624         0.07 Lock Operations
LCK.objhdr.colls               0         0.00 Collisions
LCK.exp.creat                  1         0.00 Created locks
LCK.exp.destroy                0         0.00 Destroyed locks
LCK.exp.locks               8773         1.01 Lock Operations
LCK.exp.colls                  0         0.00 Collisions
LCK.lru.creat                  2         0.00 Created locks
LCK.lru.destroy                0         0.00 Destroyed locks
LCK.lru.locks                 30         0.00 Lock Operations
LCK.lru.colls                  0         0.00 Collisions
LCK.cli.creat                  1         0.00 Created locks
LCK.cli.destroy                0         0.00 Destroyed locks
LCK.cli.locks               2922         0.33 Lock Operations
LCK.cli.colls                  0         0.00 Collisions
LCK.ban.creat                  1         0.00 Created locks
LCK.ban.destroy                0         0.00 Destroyed locks
LCK.ban.locks               8774         1.01 Lock Operations
LCK.ban.colls                  0         0.00 Collisions
LCK.vbp.creat                  1         0.00 Created locks
LCK.vbp.destroy                0         0.00 Destroyed locks
LCK.vbp.locks               3488         0.40 Lock Operations
LCK.vbp.colls                  0         0.00 Collisions
LCK.vbe.creat                  1         0.00 Created locks
LCK.vbe.destroy                0         0.00 Destroyed locks
LCK.vbe.locks                 17         0.00 Lock Operations
LCK.vbe.colls                  0         0.00 Collisions
LCK.backend.creat              4         0.00 Created locks
LCK.backend.destroy            0         0.00 Destroyed locks
LCK.backend.locks            185         0.02 Lock Operations
LCK.backend.colls              0         0.00 Collisions
SMA.s0.c_req                  62         0.01 Allocator requests
SMA.s0.c_fail                  0         0.00 Allocator failures
SMA.s0.c_bytes           1798968       206.09 Bytes allocated
SMA.s0.c_freed           1548284       177.37 Bytes freed
SMA.s0.g_alloc                30          .   Allocations outstanding
SMA.s0.g_bytes            250684          .   Bytes outstanding
SMA.s0.g_space         268184772          .   Bytes available
SMA.Transient.c_req            0         0.00 Allocator requests
SMA.Transient.c_fail           0         0.00 Allocator failures
SMA.Transient.c_bytes            0         0.00 Bytes allocated
SMA.Transient.c_freed            0         0.00 Bytes freed
SMA.Transient.g_alloc            0          .   Allocations outstanding
SMA.Transient.g_bytes            0          .   Bytes outstanding
SMA.Transient.g_space            0          .   Bytes available
VBE.backend11(10.1.1.20,,9010).vcls            1          .   VCL references
VBE.backend11(10.1.1.20,,9010).happy4611687117937967103          .   Happy health probes
VBE.backend12(10.1.1.21,,9010).vcls            1          .   VCL references
VBE.backend12(10.1.1.21,,9010).happy4611687117937967103          .   Happy health probes
VBE.backend21(10.1.1.20,,9003).vcls             1          .   VCL references
VBE.backend21(10.1.1.20,,9003).happy 18446744073709551615          .   Happy health probes
VBE.backend22(10.1.1.21,,9003).vcls             1          .   VCL references
VBE.backend22(10.1.1.21,,9003).happy 18446744073709551615          .   Happy health probes

_______________________________________________
varnish-misc mailing list
varnish-misc at varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc



More information about the varnish-misc mailing list