Varnish attempting to serve disconnected clients

Lee Hambley lee.hambley at gmail.com
Thu Jul 28 12:04:15 UTC 2022


Dear List,

I'm debugging a problem with our Varnish, and our QA folks found an
interesting case.

Ultimately this breadcrumb trail was discovered looking into our varnishes
having an enormous number of open connections in CLOSE_WAIT status, in a
way that doesn't appear to be ideal connection reuse, rather dead
connections that we need to close out; non the less we're focused on a
slightly more specific sub-issue of that issue right now:

We have noticed that if a request is eligible to be coalesced, but the
impatient client disconnects before the request is served, Varnish
continues to try and serve that request by going to the backend even after
the client is disconnected.

I suspect in our case, we can disable request coalescing, but I didn't want
to miss an opportunity to report a possible bug, or learn something about a
corner of Varnish I don't know well... here's our setup:

- With Varnish 6.6.1
- Given a toy python script backend which answers 200OK to the health
check, but answers after 10s to the other requests with an `HTTP 500`
error; [source linked below]
- Given the following config [below]
  - started with `/usr/local/sbin/varnishd -n /usr/local/var/varnish -F -f
$PWD/foo.vcl -a test=:21601,HTTP`
- When running `for i in $(seq 10); do curl -m 1 localhost:21601/ &; done`
(ampersand for background is important)
- Varnish makes 1 request to the backend, coalescing the others
- Clients all disconnect thanks to `curl -m 1` (`--max-time`)  (or `0.1`,
no difference, naturally)
- First request completed with `HTTP 500`, Varnish continues to retry
requests for disconnected clients. (logs without health checks below)

In the real set-up Varnish actually only handles the next request, then the
next, then the next one each 10 seconds, I didn't take the time to
reproduce that in this set-up as I believe it's a bug that Varnish
continues to do work for disconnected clients. I guess in my toy we benefit
from hit-for-pass, and in our real world setup that's not true.

I can somehow imagine this as a feature (populate the cache even though the
client went away) but queueing hundreds or thousands of requests, and
nibbling away at them one-by-one even after clients are long-since hung up
is causing resource exhaustion for us; we can tune the configs
significantly now that we know the issue, but we'd love to get some
opinionated feedback on what would be an idiomatic approach to this.

- Are we doing something wrong?
- Should varnish still go to the backend to serve disconnected clients?
- Is this a bug, should I report it somewhere more formally and attach the
repro case a bit more diligently?

Warm regards everyone, thanks for Varnish, an active list, and active
support on StackOverflow and such.

[config]
vcl 4.0;
backend foo {
        .host = "127.0.0.1";
        .port = "3100";
    .probe = {
      .url = "/healthz";
      .interval = 5s;
      .timeout = 1s;
      .window = 5;
      .threshold = 3;
    }
}
sub vcl_recv {
  return (hash);
}

[toy python server logs]
python3 ./server.py
socket binded to port 3100
2022-07-28 13:45:05.589997 socket is listening
2022-07-28 13:45:17.336371 from client: b'GET / HTTP/1.1\r\nHost:
localhost:21601\r\nUser-Agent: curl/7.68.0\r\nAccept: */*\r\nX-Forwarded-For
: ::1\r\nX-Varnish: 32771\r\n\r\n'
^^ first request
2022-07-28 13:45:27.425837 from client: b'GET / HTTP/1.1\r\nHost:
localhost:21601\r\nUser-Agent: curl/7.68.0\r\nAccept: */*\r\nX-Forwarded-For
: ::1\r\nX-Varnish: 163841\r\n\r\n'
2022-07-28 13:45:27.437121 from client: b'GET / HTTP/1.1\r\nHost:
localhost:21601\r\nUser-Agent: curl/7.68.0\r\nAccept: */*\r\nX-Forwarded-For
: ::1\r\nX-Varnish: 229377\r\n\r\n'
2022-07-28 13:45:27.437913 from client: b'GET / HTTP/1.1\r\nHost:
localhost:21601\r\nUser-Agent: curl/7.68.0\r\nAccept: */*\r\nX-Forwarded-For
: ::1\r\nX-Varnish: 131075\r\n\r\n'
2022-07-28 13:45:27.437999 from client: b'GET / HTTP/1.1\r\nHost:
localhost:21601\r\nUser-Agent: curl/7.68.0\r\nAccept: */*\r\nX-Forwarded-For
: ::1\r\nX-Varnish: 196609\r\n\r\n'
2022-07-28 13:45:27.438363 from client: b'GET / HTTP/1.1\r\nHost:
localhost:21601\r\nUser-Agent: curl/7.68.0\r\nAccept: */*\r\nX-Forwarded-For
: ::1\r\nX-Varnish: 98311\r\n\r\n'
2022-07-28 13:45:27.437782 from client: b'GET / HTTP/1.1\r\nHost:
localhost:21601\r\nUser-Agent: curl/7.68.0\r\nAccept: */*\r\nX-Forwarded-For
: ::1\r\nX-Varnish: 65545\r\n\r\n'
2022-07-28 13:45:27.438033 from client: b'GET / HTTP/1.1\r\nHost:
localhost:21601\r\nUser-Agent: curl/7.68.0\r\nAccept: */*\r\nX-Forwarded-For
: ::1\r\nX-Varnish: 327681\r\n\r\n'
2022-07-28 13:45:27.439453 from client: b'GET / HTTP/1.1\r\nHost:
localhost:21601\r\nUser-Agent: curl/7.68.0\r\nAccept: */*\r\nX-Forwarded-For
: ::1\r\nX-Varnish: 294913\r\n\r\n'
2022-07-28 13:45:27.438401 from client: b'GET / HTTP/1.1\r\nHost:
localhost:21601\r\nUser-Agent: curl/7.68.0\r\nAccept: */*\r\nX-Forwarded-For
: ::1\r\nX-Varnish: 262145\r\n\r\n'
^^ other requests

[toy python server code]
https://gist.github.com/leehambley/fa634f91936b1d30422e3af96ba2eec5

Lee Hambley
http://lee.hambley.name/
+49 (0) 170 298 5667
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20220728/837bd041/attachment-0001.html>


More information about the varnish-misc mailing list