[PATCH] Instruct the kernel to reset the connection for SC_RX_TIMEOUT, and others

Nils Goroll slink at schokola.de
Sun Mar 1 17:30:20 CET 2015


On 01/03/15 10:05, Poul-Henning Kamp wrote:
>> >Closing TCP connections with SO_LINGER unset [...]
> I pressume you mean "set" ?
> 
> The problem with !SO_LINGER (we tried it some years back) is that
> all queued data is discarded the moment you call close(2)

I fail to find a mistake in the commitmsg of the patch I proposed: My
understanding is that we get these semantics with linger on and linger timeout
== 0. From Linux net/ipv4/tcp.c  :

        } else if (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime) {
                /* Check zero linger _after_ checking for unread data. */
                sk->sk_prot->disconnect(sk, 0);
                NET_INC_STATS_USER(sock_net(sk), LINUX_MIB_TCPABORTONDATA);

Whether or not data gets discarded depends on whether the linger timeout is set
and reached before all outstanding data is sent.


I think what we want for Varnish is:

- attempt an orderly shutdown (3way FIN) for connections where we have actually
  sent data
- reset (RST) the connection as quickly and efficiently as possible otherwise.

In particular, a 1-way FIN only closes our write direction and we need a
cooperating client to complete the duplex close, while sending an RST implies a
full close.

In the patch I suggest to limit sending RST to client read timeouts and other
fatal conditions. We might want to consider using it for more scenarios should
this turn out to be a good idea. If we wanted to be on the safe side, we could
also limit the change to RX_TIMEOUT for the time being (which also would be
sufficient for the particular real world case I am working on).

On 01/03/15 10:07, Poul-Henning Kamp wrote:
> real-life testing that will be necessary for any change to this aspect.

I'd volunteer to test this on production systems.

Some tcpdumps with linux 3.13.1 below.

=== current varnish origin master waiter timeout (5s default) ===

16:47:40.982533 IP6 ::1.50785 > ::1.8080: Flags [S], seq 1964376644, win 43690,
options [mss 65476,sackOK,TS val 576054 ecr 0,nop,wscale 7], length 0
16:47:40.982541 IP6 ::1.8080 > ::1.50785: Flags [R.], seq 0, ack 1964376645, win
0, length 0
16:47:40.982590 IP 127.0.0.1.54336 > 127.0.0.1.8080: Flags [S], seq 1857531233,
win 43690, options [mss 65495,sackOK,TS val 576054 ecr 0,nop,wscale 7], length 0
16:47:40.982600 IP 127.0.0.1.8080 > 127.0.0.1.54336: Flags [S.], seq 2353711226,
ack 1857531234, win 43690, options [mss 65495,sackOK,TS val 576054 ecr
576054,nop,wscale 7], length 0
16:47:40.982617 IP 127.0.0.1.54336 > 127.0.0.1.8080: Flags [.], ack 1, win 342,
options [nop,nop,TS val 576054 ecr 576054], length 0
16:47:46.407450 IP 127.0.0.1.8080 > 127.0.0.1.54336: Flags [F.], seq 1, ack 1,
win 342, options [nop,nop,TS val 577410 ecr 576054], length 0
16:47:46.407517 IP 127.0.0.1.54336 > 127.0.0.1.8080: Flags [F.], seq 1, ack 2,
win 342, options [nop,nop,TS val 577410 ecr 577410], length 0
16:47:46.407537 IP 127.0.0.1.8080 > 127.0.0.1.54336: Flags [.], ack 2, win 342,
options [nop,nop,TS val 577410 ecr 577410], length 0

=== current varnish origin master request read timeout (2s default) ===

note: I suggested to increase this timeout to 7 seconds

16:48:44.162945 IP6 ::1.50794 > ::1.8080: Flags [S], seq 3924075046, win 43690,
options [mss 65476,sackOK,TS val 591849 ecr 0,nop,wscale 7], length 0
16:48:44.162954 IP6 ::1.8080 > ::1.50794: Flags [R.], seq 0, ack 3924075047, win
0, length 0
16:48:44.163001 IP 127.0.0.1.54345 > 127.0.0.1.8080: Flags [S], seq 3268495574,
win 43690, options [mss 65495,sackOK,TS val 591849 ecr 0,nop,wscale 7], length 0
16:48:44.163011 IP 127.0.0.1.8080 > 127.0.0.1.54345: Flags [S.], seq 1673884791,
ack 3268495575, win 43690, options [mss 65495,sackOK,TS val 591849 ecr
591849,nop,wscale 7], length 0
16:48:44.163019 IP 127.0.0.1.54345 > 127.0.0.1.8080: Flags [.], ack 1, win 342,
options [nop,nop,TS val 591849 ecr 591849], length 0
16:48:46.035540 IP 127.0.0.1.54345 > 127.0.0.1.8080: Flags [P.], seq 1:8, ack 1,
win 342, options [nop,nop,TS val 592317 ecr 591849], length 7
16:48:46.035566 IP 127.0.0.1.8080 > 127.0.0.1.54345: Flags [.], ack 8, win 342,
options [nop,nop,TS val 592317 ecr 592317], length 0
16:48:48.037686 IP 127.0.0.1.8080 > 127.0.0.1.54345: Flags [F.], seq 1, ack 8,
win 342, options [nop,nop,TS val 592817 ecr 592317], length 0
16:48:48.037751 IP 127.0.0.1.54345 > 127.0.0.1.8080: Flags [F.], seq 8, ack 2,
win 342, options [nop,nop,TS val 592817 ecr 592817], length 0
16:48:48.037773 IP 127.0.0.1.8080 > 127.0.0.1.54345: Flags [.], ack 9, win 342,
options [nop,nop,TS val 592817 ecr 592817], length 0

=== with SO_LINGER proposed patch - waiter timeout (5s default) ===

16:51:59.473915 IP6 ::1.50797 > ::1.8080: Flags [S], seq 3546360798, win 43690,
options [mss 65476,sackOK,TS val 640676 ecr 0,nop,wscale 7], length 0
16:51:59.473936 IP6 ::1.8080 > ::1.50797: Flags [R.], seq 0, ack 3546360799, win
0, length 0
16:51:59.474059 IP 127.0.0.1.54348 > 127.0.0.1.8080: Flags [S], seq 2664129049,
win 43690, options [mss 65495,sackOK,TS val 640677 ecr 0,nop,wscale 7], length 0
16:51:59.474073 IP 127.0.0.1.8080 > 127.0.0.1.54348: Flags [S.], seq 3919584645,
ack 2664129050, win 43690, options [mss 65495,sackOK,TS val 640677 ecr
640677,nop,wscale 7], length 0
16:51:59.474090 IP 127.0.0.1.54348 > 127.0.0.1.8080: Flags [.], ack 1, win 342,
options [nop,nop,TS val 640677 ecr 640677], length 0
16:52:04.689346 IP 127.0.0.1.8080 > 127.0.0.1.54348: Flags [R.], seq 1, ack 1,
win 342, options [nop,nop,TS val 641980 ecr 640677], length 0


=== with SO_LINGER proposed patch - request read timeout (2s default) ===

16:52:11.788384 IP6 ::1.50799 > ::1.8080: Flags [S], seq 1598190622, win 43690,
options [mss 65476,sackOK,TS val 643755 ecr 0,nop,wscale 7], length 0
16:52:11.788401 IP6 ::1.8080 > ::1.50799: Flags [R.], seq 0, ack 1598190623, win
0, length 0
16:52:11.788474 IP 127.0.0.1.54350 > 127.0.0.1.8080: Flags [S], seq 510724766,
win 43690, options [mss 65495,sackOK,TS val 643755 ecr 0,nop,wscale 7], length 0
16:52:11.788494 IP 127.0.0.1.8080 > 127.0.0.1.54350: Flags [S.], seq 1351958833,
ack 510724767, win 43690, options [mss 65495,sackOK,TS val 643755 ecr
643755,nop,wscale 7], length 0
16:52:11.788513 IP 127.0.0.1.54350 > 127.0.0.1.8080: Flags [.], ack 1, win 342,
options [nop,nop,TS val 643755 ecr 643755], length 0
16:52:14.144393 IP 127.0.0.1.54350 > 127.0.0.1.8080: Flags [P.], seq 1:7, ack 1,
win 342, options [nop,nop,TS val 644344 ecr 643755], length 6
16:52:14.144424 IP 127.0.0.1.8080 > 127.0.0.1.54350: Flags [.], ack 7, win 342,
options [nop,nop,TS val 644344 ecr 644344], length 0
16:52:16.146596 IP 127.0.0.1.8080 > 127.0.0.1.54350: Flags [R.], seq 1, ack 7,
win 342, options [nop,nop,TS val 644845 ecr 644344], length 0



More information about the varnish-dev mailing list