please test timeout_req 2 vs 7 seconds

Nils Goroll slink at schokola.de
Mon Mar 16 20:03:43 CET 2015


Master now has the MAIN.sc_rx_timeout statistic (and more).

I'd very much appreciate production system tests (preferably with a high number
of sessions and ideally a good percentage of malicious requests) with
timeout_req set to 2 seconds (default) vs 7 seconds.

What we are looking for are statistically relevant differences in the
MAIN.sc_rx_timeout vs. MAIN.sess_conn ratio and no statistically relevant
differences in the other MAIN.sc* vs. MAIN.sess_conn ratios.

Please report varnishstat -1 outputs.

Thank you!

On 16/03/15 13:43, Nils Goroll wrote:
> Here's the consensus from the bugwash on irc:
> 
> - we'll add stats counters for SESS_CLOSE
> - then we'll check on larger production systems which impact increasing
> timeout_req to 7 seconds has by
>   - comparing the RX_TIMEOUT to sess_conn
>   - comparing tcp counters
> 
> scn wants to provide advise on which tcp stats to watch
> 
> Nils
> 
> On 15/03/15 18:36, Nils Goroll wrote:
>> Hi,
>>
>> some time has passed since my initial email regarding this suggestion and it
>> still holds.
>>
>> Unless there is a strong argument against it, I think we really should increase
>> the default timeout_req to 7 seconds. I think the argumentation for this value
>> is sound and I haven't found any reasons against it.
>>
>> Please keep this suggestion separate from the suggestion to re-introduce
>> SO_LINGER. I still need to do production system tests with it.
>>
>> Nils
>>
>> On 26/02/15 11:27, Nils Goroll wrote:
>>> This tcpdump output illustrates an issue we seem to have with default Linux tcp
>>> timeouts and the default timeout_req of 2 seconds:
>>>
>>> 16:47:44.542049 IP client.49550 > varnish.80: Flags [S], seq 29295818, win 4380,
>>> options [mss 1460,sackOK,eol], length 0
>>> 16:47:44.542080 IP varnish.80 > client.49550: Flags [S.], seq 3652568857, ack
>>> 29295819, win 29200, options [mss 1460,nop,nop,sackOK], length 0
>>> 16:47:44.542250 IP client.49550 > varnish.80: Flags [.], ack 1, win 4380, length 0
>>> 16:47:46.080501 IP client.49550 > varnish.80: Flags [P.], seq 1:1453, ack 1, win
>>> 4380, length 1452
>>> 16:47:46.080528 IP varnish.80 > client.49550: Flags [.], ack 1453, win 31944,
>>> length 0
>>> 16:47:48.082783 IP varnish.80 > client.49550: Flags [F.], seq 1, ack 1453, win
>>> 31944, length 0
>>> 16:47:48.083070 IP client.49550 > varnish.80: Flags [.], ack 2, win 4380, length 0
>>> 16:47:48.350763 IP client.49550 > varnish.80: Flags [P.], seq 1453:2905, ack 2,
>>> win 4380, length 1452
>>> 16:47:48.350792 IP varnish.80 > client.49550: Flags [R], seq 3652568859, win 0,
>>> length 0
>>>
>>> The packet at 16:47:46.080501 contains the first part of a request up to the
>>> start of a very long cookie line.
>>>
>>> At 16:47:48 varnish closes after reaching timeout_req of 2s. Then, the client
>>> immediately acks.
>>>
>>> My understanding is that the varnish->client ack 1453 got lost and the client
>>> did not get around to retransmit seq 1:1453 before we timed out.
>>>
>>>
>>> The most helpful online reference regarding recommended initial tcp
>>> retransmittion timeouts I have found so far is
>>> http://tools.ietf.org/html/rfc6298#ref-PA00
>>>
>>> In summary, an initial timeout (RTO) of 1s is now recommended, but the former 3s
>>> RTO remains valid. So, for any client following the former 3s recommendation,
>>> current we don't even tolerate a single packet retransmission after 3way is
>>> complete. For those clients following the new 1s recommended RTO, timing is also
>>> really tight it seems unlikely that we tolerate retransmission of two packets.
>>>
>>> Based on this, I'd suggest to raise the default timeout_req to 7 seconds to
>>> allow for two retransmissions at RTO=3.
>>>
>>> This seems to be particularly relevant with the growing popularity of mobile
>>> clients.
>>>
>>> The risk is increased resource usage for malicious requests. To address it, I'd
>>> suggest to document that lowering timeout_req can be an option to mitigate
>>> certain DoS (slowloris) attacks.
>>>
>>>
>>> Nils
>>>
>>>
>>> _______________________________________________
>>> varnish-dev mailing list
>>> varnish-dev at varnish-cache.org
>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
>>>
>>
>> _______________________________________________
>> varnish-dev mailing list
>> varnish-dev at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
>>
> 
> _______________________________________________
> varnish-dev mailing list
> varnish-dev at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
> 



More information about the varnish-dev mailing list