Modifying req.url and restart() produces ncsa to log wrong URL resource

Dridi Boukelmoune dridi at varni.sh
Thu Oct 1 13:13:42 UTC 2020


On Tue, Sep 29, 2020 at 8:10 PM Marian Velez <marianmvelez at gmail.com> wrote:
>
> Hiya all!
>
> I've been trying to implement an abtest engine on my varnish. The problem I've come across is related to the manipulation of the original req.url property.
>
> What I'm experiencing is that when setting a new "req.url" value and after forcing a restart(), the URL that varnishncsa uses to log is just the one it has set when it re-enters recv flow, but not the one it uses from the backend or the last req.URL object either (which will even have much more sense since).
>
> I've made a sample snippet of what I observe just for the sake of reproducing it and to take away all the abtest logic that may be just confusing.
>
> vcl 4.1;
> backend default {
>  .host = "127.0.0.1";
>  .port = "8000";
> }
>
> sub vcl_recv {
>   if (req.restarts == 0) {
>     set req.http.X-Original-URL = req.url;
>     set req.url = "/this_url_does_not_exist";
>     return(restart);
>   }
>   else {
>     #restore original URL
>     set req.url = req.http.X-Original-URL;
>   }
> }
>
> That vcl just sets an alternative req.url and triggers a flow restart.
> varnishncsa logs this entry
>
> ::1 - - [29/Sep/2020:16:49:08 -0300] "GET http://localhost:6081/this_url_does_not_exist HTTP/1.1" 200 12 "-" "curl/7.72.0"
>
>
> Of course that URL does not exist, but since its using the req.URL object from which varnish recv flow started, it just keeps that for logging, and not any later req.URL set.
>
> Here attached is the varnishlog output from a fresh start:
>
> *   << Request  >> 2
> -   Begin          req 1 rxreq
> -   Timestamp      Start: 1601409664.461391 0.000000 0.000000
> -   Timestamp      Req: 1601409664.461391 0.000000 0.000000
> -   VCL_use        boot
> -   ReqStart       ::1 55180 a0
> -   ReqMethod      GET
> -   ReqURL         /this_object_exists
> -   ReqProtocol    HTTP/1.1
> -   ReqHeader      Host: localhost:6081
> -   ReqHeader      User-Agent: curl/7.72.0
> -   ReqHeader      Accept: */*
> -   ReqHeader      X-Forwarded-For: ::1
> -   VCL_call       RECV
> -   ReqHeader      X-Original-URL: /this_object_exists
> -   ReqURL         /this_url_does_not_exist
> -   VCL_return     restart
> -   VCL_call       HASH
> -   VCL_return     lookup
> -   Timestamp      Restart: 1601409664.461429 0.000037 0.000037
> -   Link           req 3 restart
> -   End

The transaction above does not reply to the client, it's not picked up
by varnishncsa.

> *   << BeReq    >> 4
> -   Begin          bereq 3 fetch
> -   VCL_use        boot
> -   Timestamp      Start: 1601409664.461553 0.000000 0.000000
> -   BereqMethod    GET
> -   BereqURL       /this_object_exists
> -   BereqProtocol  HTTP/1.1
> -   BereqHeader    User-Agent: curl/7.72.0
> -   BereqHeader    Accept: */*
> -   BereqHeader    X-Forwarded-For: ::1
> -   BereqHeader    host: localhost:6081
> -   BereqHeader    Accept-Encoding: gzip
> -   BereqHeader    X-Varnish: 4
> -   VCL_call       BACKEND_FETCH
> -   VCL_return     fetch
> -   BackendOpen    26 default 127.0.0.1 8000 127.0.0.1 59480 connect
> -   Timestamp      Bereq: 1601409664.462196 0.000642 0.000642
> -   Timestamp      Beresp: 1601409664.463115 0.001561 0.000918
> -   BerespProtocol HTTP/1.0
> -   BerespStatus   200
> -   BerespReason   OK
> -   BerespHeader   Server: SimpleHTTP/0.6 Python/3.8.6
> -   BerespHeader   Date: Tue, 29 Sep 2020 20:01:04 GMT
> -   BerespHeader   Content-type: application/octet-stream
> -   BerespHeader   Content-Length: 12
> -   BerespHeader   Last-Modified: Tue, 29 Sep 2020 19:36:31 GMT
> -   TTL            RFC 120 10 0 1601409664 1601409664 1601409664 0 0 cacheable
> -   VCL_call       BACKEND_RESPONSE
> -   VCL_return     deliver
> -   Filters
> -   Storage        malloc s0
> -   Fetch_Body     3 length stream
> -   BackendClose   26 default close
> -   Timestamp      BerespBody: 1601409664.463236 0.001682 0.000120
> -   Length         12
> -   BereqAcct      155 0 155 199 12 211
> -   End
>
> *   << Request  >> 3
> -   Begin          req 2 restart
> -   Timestamp      Start: 1601409664.461429 0.000037 0.000000
> -   ReqStart       ::1 55180 a0
> -   ReqMethod      GET
> -   ReqURL         /this_url_does_not_exist
> -   ReqProtocol    HTTP/1.1
> -   ReqHeader      Host: localhost:6081
> -   ReqHeader      User-Agent: curl/7.72.0
> -   ReqHeader      Accept: */*
> -   ReqHeader      X-Forwarded-For: ::1
> -   ReqHeader      X-Original-URL: /this_object_exists

This transaction replies to the client but starts with the wrong URL,
but we can find the correct one in the header above.

> -   VCL_call       RECV
> -   ReqURL         /this_object_exists
> -   ReqUnset       X-Original-URL: /this_object_exists
> -   ReqUnset       Host: localhost:6081
> -   ReqHeader      host: localhost:6081
> -   VCL_return     hash
> -   VCL_call       HASH
> -   VCL_return     lookup
> -   VCL_call       MISS
> -   VCL_return     fetch
> -   Link           bereq 4 fetch
> -   Timestamp      Fetch: 1601409664.463212 0.001821 0.001783
> -   RespProtocol   HTTP/1.0
> -   RespStatus     200
> -   RespReason     OK
> -   RespHeader     Server: SimpleHTTP/0.6 Python/3.8.6
> -   RespHeader     Date: Tue, 29 Sep 2020 20:01:04 GMT
> -   RespHeader     Content-type: application/octet-stream
> -   RespHeader     Content-Length: 12
> -   RespHeader     Last-Modified: Tue, 29 Sep 2020 19:36:31 GMT
> -   RespProtocol   HTTP/1.1
> -   RespHeader     X-Varnish: 3
> -   RespHeader     Age: 0
> -   RespHeader     Via: 1.1 varnish (Varnish/6.5)
> -   VCL_call       DELIVER
> -   VCL_return     deliver
> -   Timestamp      Process: 1601409664.463225 0.001833 0.000012
> -   Filters
> -   RespHeader     Accept-Ranges: bytes
> -   RespHeader     Connection: keep-alive
> -   Timestamp      Resp: 1601409664.463286 0.001894 0.000061
> -   ReqAcct        96 0 96 299 12 311
> -   End
>
> *   << Session  >> 1
> -   Begin          sess 0 HTTP/1
> -   SessOpen       ::1 55180 a0 ::1 6081 1601409664.461310 18
> -   Link           req 2 rxreq
> -   SessClose      REM_CLOSE 0.002
> -   End
>
> As you can see from the FETCH block, what it effectively fetches from backend is not what varnishncsa logs.
> I think it has something to do with the order in which the restart affects the flow, since its not replacing the req.url object, just appending a new one. BACKEND FETCH uses the right URL object since it uses the last one, but varnishncsa logs the first it picked up from that restart.
>
> I've tried setting a blank req.url or unsetting in an attempt to avoid having several req.URL objects introducing problems, but it seems that property is readonly since I get errors like "Variable cannot be unset" or "Setting req.url to empty sting - FAILED".
>
> Unfortunately I have no other way to implement my abtest engine than by using the restart() order from varnish, but this logging issue is extremely affecting what I can see to debug my stuff.
>
> Any clue of what trick I could use to workaround this?

You can grab the right URL from the header, but you need to always set
that header, even when you don't use the restart feature. Simply
replace the URL pattern from the default format with the header
instead:

> varnishncsa -F '%h %l %u %t "%m http://%{Host}i%{X-Original-URL}i%q %H" %s %b "%{Referer}i" "%{User-agent}i"'

Based on the varnishncsa manual [1] it should be something like that.

Dridi

[1] https://varnish-cache.org/docs/6.5/reference/varnishncsa.html


More information about the varnish-misc mailing list