Re: Random “http first read error: EOF” errors

Hazar Güney hazarguney at gmail.com
Mon Apr 3 10:37:12 CEST 2017


I cannot cut the LB out due to network design. There is no log on the LB
also. Problematic requests are not large (js/css files), I suspect that
this is a miscommunication issue between Varnish and Apache but I am not
able to detect the reason yet.

On Mon, Apr 3, 2017 at 10:32 AM, Guillaume Quintard <
guillaume at varnish-software.com> wrote:

> Large requests/responses are dropped by the LB, maybe?
>
> --
> Guillaume Quintard
>
> On Mon, Apr 3, 2017 at 9:12 AM, Andrei <lagged at gmail.com> wrote:
>
>> So the Varnish backend requests go through a load balancer before
>> reaching Apache? What about those logs? What if you cut that LB out, and
>> just use directors to LB in Varnish directly?
>>
>> On Sun, Apr 2, 2017 at 12:39 PM, Hazar Güney <hazarguney at gmail.com>
>> wrote:
>>
>>> Btw, I need to also note that traffic is routed to Varnish from load
>>> balancer:
>>>
>>> LB -> Varnish -> LB -> Backend pool
>>>
>>> Time does not matter.. It occurs during both peak and regular hours. Even
>>> during peak hours we do not reach the "local ports" limit. Unfortunately
>>> there is no any clue in the logs. There is no evidence that Apache
>>> restarts on the backend pool during occurence of the issue.
>>>
>>> On Sat, Apr 1, 2017 at 9:44 PM, Andrei <lagged at gmail.com> wrote:
>>>
>>>> If it's during peak hours are you sure there aren't any rate limits
>>>> being reached? Perhaps net.ipv4.ip_local_port_range might need a bump?
>>>> Are Apache or syslog logging anything around those times? No silly periodic
>>>> (Apache) graceful restarts? Just a few thoughts :)
>>>>
>>>> On Sat, Apr 1, 2017 at 2:14 PM, Hazar Güney <hazarguney at gmail.com>
>>>> wrote:
>>>>
>>>>> We see this error a few times in a day on a highly busy production
>>>>> environment. Unfortunately there is too much traffic on the server to keep
>>>>> tcpdump/ngrep running and we cannot re-produce it on test environment :(
>>>>>
>>>>> I have started tcpdump on a test environment of another implementation
>>>>> and will let you as soon as the issue gets triggerred again.
>>>>>
>>>>> On Fri, Mar 31, 2017 at 4:17 PM, Andrei <lagged at gmail.com> wrote:
>>>>>
>>>>>> Can you provide a tcpdump/ngrep of the requests between
>>>>>> Client/Varnish/Apache along with the varnishlog entry to see if that
>>>>>> uncovers anything?
>>>>>>
>>>>>> On Fri, Mar 31, 2017 at 7:25 AM, Hazar Güney <hazarguney at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Any idea?
>>>>>>>
>>>>>>> On Thu, Mar 30, 2017 at 3:41 PM, Hazar Güney <hazarguney at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It did not work either:
>>>>>>>>
>>>>>>>>     *   << BeReq    >> 127418176
>>>>>>>>     -   Begin          bereq 127418175 fetch
>>>>>>>>     -   Timestamp      Start: 1490877149.450124 0.000000 0.000000
>>>>>>>>     -   BereqMethod    GET
>>>>>>>>     -   BereqURL       XXXX
>>>>>>>>     -   BereqProtocol  HTTP/1.1
>>>>>>>>     -   BereqHeader    Accept: text/css,*/*;q=0.1
>>>>>>>>     -   BereqHeader    User-Agent: Mozilla/5.0 (iPhone; CPU iPhone
>>>>>>>> OS 10_2 like Mac OS X) AppleWebKit/602.3.12 (KHTML, like Gecko)
>>>>>>>> Version/10.0 Mobile/14C92 Safari/602.1
>>>>>>>>     -   BereqHeader    Accept-Language: tr-tr
>>>>>>>>     -   BereqHeader    Referer: XXXX
>>>>>>>>     -   BereqHeader    Host: XXXX
>>>>>>>>     -   BereqHeader    RIP: XXXX
>>>>>>>>     -   BereqHeader    X-Forwarded-For: XXXX
>>>>>>>>     -   BereqHeader    Accept-Encoding: gzip
>>>>>>>>     -   BereqHeader    X-Varnish: 127418176
>>>>>>>>     -   VCL_call       BACKEND_FETCH
>>>>>>>>     -   BereqHeader    connection: Close
>>>>>>>>     -   VCL_return     fetch
>>>>>>>>     -   BackendOpen    25 reload_2017-03-30T14:53:46.st2
>>>>>>>> 10.35.78.11 80 172.17.0.2 59152
>>>>>>>>     -   BackendStart   10.35.78.11 80
>>>>>>>>     -   Timestamp      Bereq: 1490877149.450594 0.000470 0.000470
>>>>>>>>     -   FetchError     http first read error: EOF
>>>>>>>>     -   BackendClose   25 reload_2017-03-30T14:53:46.st2
>>>>>>>>     -   Timestamp      Beresp: 1490877149.451184 0.001060 0.000590
>>>>>>>>     -   Timestamp      Error: 1490877149.451189 0.001065 0.000005
>>>>>>>>     -   BerespProtocol HTTP/1.1
>>>>>>>>     -   BerespStatus   503
>>>>>>>>     -   BerespReason   Service Unavailable
>>>>>>>>     -   BerespReason   Backend fetch failed
>>>>>>>>     -   BerespHeader   Date: Thu, 30 Mar 2017 12:32:29 GMT
>>>>>>>>     -   BerespHeader   Server: Varnish
>>>>>>>>     -   VCL_call       BACKEND_ERROR
>>>>>>>>     -   BereqHeader    X-Varnish-Backend-5xx: 1
>>>>>>>>     -   VCL_return     retry
>>>>>>>>     -   Timestamp      Retry: 1490877149.451205 0.001081 0.000016
>>>>>>>>     -   Link           bereq 127298071 retry
>>>>>>>>     -   End
>>>>>>>>
>>>>>>>> On Thu, Mar 30, 2017 at 2:34 PM, Guillaume Quintard <
>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>
>>>>>>>>> It does, I'm suspecting that the connection reuse is creating some
>>>>>>>>> issues, probably because Apache is doing some non-standard stuff (protip:
>>>>>>>>> always blame Apache).
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Guillaume Quintard
>>>>>>>>>
>>>>>>>>> On Thu, Mar 30, 2017 at 1:17 PM, Hazar Güney <hazarguney at gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> "Connection: close" supersedes keep-alive behavior, is that
>>>>>>>>>> correct?
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 30, 2017 at 2:08 PM, Guillaume Quintard <
>>>>>>>>>> guillaume at varnish-software.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Can you try something: add 'set bereq.http.connection = "Close";
>>>>>>>>>>> ' at the beginning of vcl_backend_fetch and see if that helps?
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Guillaume Quintard
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 30, 2017 at 1:04 PM, Hazar Güney <
>>>>>>>>>>> hazarguney at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> MaxKeepAliveRequests 20
>>>>>>>>>>>> KeepAliveTimeout 2
>>>>>>>>>>>>
>>>>>>>>>>>> Version is "4.1.3 revision 5e3b6d2". We have also seen "straight
>>>>>>>>>>>> insufficient bytes" error with POST requests to a specific php script
>>>>>>>>>>>> hosted by another backend and fixed it by using "pipe" instead of "pass"
>>>>>>>>>>>> but this specific backend gives "http first read error: EOF" error. Another
>>>>>>>>>>>> example from today:
>>>>>>>>>>>>
>>>>>>>>>>>> *   << BeReq    >> 126635444
>>>>>>>>>>>> -   Begin          bereq 126635443 fetch
>>>>>>>>>>>> -   Timestamp      Start: 1490870598.921499 0.000000 0.000000
>>>>>>>>>>>> -   BereqMethod    GET
>>>>>>>>>>>> -   BereqURL       XXXX
>>>>>>>>>>>> -   BereqProtocol  HTTP/1.1
>>>>>>>>>>>> -   BereqHeader    Host: XXXX
>>>>>>>>>>>> -   BereqHeader    User-Agent: Mozilla/5.0 (Windows NT 10.0;
>>>>>>>>>>>> Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87
>>>>>>>>>>>> Safari/537.36
>>>>>>>>>>>> -   BereqHeader    Accept: image/webp,image/*,*/*;q=0.8
>>>>>>>>>>>> -   BereqHeader    Referer: XXXX
>>>>>>>>>>>> -   BereqHeader    Accept-Language:
>>>>>>>>>>>> tr-TR,tr;q=0.8,en-US;q=0.6,en;q=0.4
>>>>>>>>>>>> -   BereqHeader    RIP: XXXX
>>>>>>>>>>>> -   BereqHeader    X-Forwarded-For: XXXX
>>>>>>>>>>>> -   BereqHeader    Accept-Encoding: gzip
>>>>>>>>>>>> -   BereqHeader    X-Varnish: 126635444
>>>>>>>>>>>> -   VCL_call       BACKEND_FETCH
>>>>>>>>>>>> -   VCL_return     fetch
>>>>>>>>>>>> -   BackendOpen    35 reload_2017-03-20T11:32:44.st2
>>>>>>>>>>>> 10.35.78.11 80 172.17.0.2 48896
>>>>>>>>>>>> -   BackendStart   10.35.78.11 80
>>>>>>>>>>>> -   Timestamp      Bereq: 1490870598.922050 0.000552 0.000552
>>>>>>>>>>>> *-   FetchError     http first read error: EOF*
>>>>>>>>>>>> -   BackendClose   35 reload_2017-03-20T11:32:44.st2
>>>>>>>>>>>> -   Timestamp      Beresp: 1490870598.922622 0.001124 0.000572
>>>>>>>>>>>> -   Timestamp      Error: 1490870598.922627 0.001129 0.000005
>>>>>>>>>>>> -   BerespProtocol HTTP/1.1
>>>>>>>>>>>> -   BerespStatus   503
>>>>>>>>>>>> -   BerespReason   Service Unavailable
>>>>>>>>>>>> -   BerespReason   Backend fetch failed
>>>>>>>>>>>> -   BerespHeader   Date: Thu, 30 Mar 2017 10:43:18 GMT
>>>>>>>>>>>> -   BerespHeader   Server: Varnish
>>>>>>>>>>>> -   VCL_call       BACKEND_ERROR
>>>>>>>>>>>> -   BereqHeader    X-Varnish-Backend-5xx: 1
>>>>>>>>>>>> -   VCL_return     retry
>>>>>>>>>>>> -   Timestamp      Retry: 1490870598.922657 0.001159 0.000030
>>>>>>>>>>>> -   Link           bereq 126832283 retry
>>>>>>>>>>>> -   End
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 29, 2017 at 12:03 PM, Mattias Geniar <
>>>>>>>>>>>> mattias at nucleus.be> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> > Backend is Apache.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In older Varnish versions, you could sometimes see a similar
>>>>>>>>>>>>> error;
>>>>>>>>>>>>>
>>>>>>>>>>>>> >   11 FetchError   c straight insufficient bytes
>>>>>>>>>>>>>
>>>>>>>>>>>>> The error message you’re seeing might be related, as it
>>>>>>>>>>>>> mentions the EOF.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This happens when the backend sends a Content-Length header
>>>>>>>>>>>>> that doesn’t match the _actual_ content length it’s sending. In Apache,
>>>>>>>>>>>>> this was commonly caused by a mod_deflate misconfiguration.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For testing, could you try disabling Gzip either in your
>>>>>>>>>>>>> backend or strip the Accept-Encoding header in Varnish to force a plain
>>>>>>>>>>>>> text response?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mattias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> varnish-misc mailing list
>>>>>>> varnish-misc at varnish-cache.org
>>>>>>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20170403/9364bbd6/attachment-0001.html>


More information about the varnish-misc mailing list