Modifying req.url and restart() produces ncsa to log wrong URL resource
Dridi Boukelmoune
dridi at varni.sh
Thu Oct 1 13:13:42 UTC 2020
On Tue, Sep 29, 2020 at 8:10 PM Marian Velez <marianmvelez at gmail.com> wrote:
>
> Hiya all!
>
> I've been trying to implement an abtest engine on my varnish. The problem I've come across is related to the manipulation of the original req.url property.
>
> What I'm experiencing is that when setting a new "req.url" value and after forcing a restart(), the URL that varnishncsa uses to log is just the one it has set when it re-enters recv flow, but not the one it uses from the backend or the last req.URL object either (which will even have much more sense since).
>
> I've made a sample snippet of what I observe just for the sake of reproducing it and to take away all the abtest logic that may be just confusing.
>
> vcl 4.1;
> backend default {
> .host = "127.0.0.1";
> .port = "8000";
> }
>
> sub vcl_recv {
> if (req.restarts == 0) {
> set req.http.X-Original-URL = req.url;
> set req.url = "/this_url_does_not_exist";
> return(restart);
> }
> else {
> #restore original URL
> set req.url = req.http.X-Original-URL;
> }
> }
>
> That vcl just sets an alternative req.url and triggers a flow restart.
> varnishncsa logs this entry
>
> ::1 - - [29/Sep/2020:16:49:08 -0300] "GET http://localhost:6081/this_url_does_not_exist HTTP/1.1" 200 12 "-" "curl/7.72.0"
>
>
> Of course that URL does not exist, but since its using the req.URL object from which varnish recv flow started, it just keeps that for logging, and not any later req.URL set.
>
> Here attached is the varnishlog output from a fresh start:
>
> * << Request >> 2
> - Begin req 1 rxreq
> - Timestamp Start: 1601409664.461391 0.000000 0.000000
> - Timestamp Req: 1601409664.461391 0.000000 0.000000
> - VCL_use boot
> - ReqStart ::1 55180 a0
> - ReqMethod GET
> - ReqURL /this_object_exists
> - ReqProtocol HTTP/1.1
> - ReqHeader Host: localhost:6081
> - ReqHeader User-Agent: curl/7.72.0
> - ReqHeader Accept: */*
> - ReqHeader X-Forwarded-For: ::1
> - VCL_call RECV
> - ReqHeader X-Original-URL: /this_object_exists
> - ReqURL /this_url_does_not_exist
> - VCL_return restart
> - VCL_call HASH
> - VCL_return lookup
> - Timestamp Restart: 1601409664.461429 0.000037 0.000037
> - Link req 3 restart
> - End
The transaction above does not reply to the client, it's not picked up
by varnishncsa.
> * << BeReq >> 4
> - Begin bereq 3 fetch
> - VCL_use boot
> - Timestamp Start: 1601409664.461553 0.000000 0.000000
> - BereqMethod GET
> - BereqURL /this_object_exists
> - BereqProtocol HTTP/1.1
> - BereqHeader User-Agent: curl/7.72.0
> - BereqHeader Accept: */*
> - BereqHeader X-Forwarded-For: ::1
> - BereqHeader host: localhost:6081
> - BereqHeader Accept-Encoding: gzip
> - BereqHeader X-Varnish: 4
> - VCL_call BACKEND_FETCH
> - VCL_return fetch
> - BackendOpen 26 default 127.0.0.1 8000 127.0.0.1 59480 connect
> - Timestamp Bereq: 1601409664.462196 0.000642 0.000642
> - Timestamp Beresp: 1601409664.463115 0.001561 0.000918
> - BerespProtocol HTTP/1.0
> - BerespStatus 200
> - BerespReason OK
> - BerespHeader Server: SimpleHTTP/0.6 Python/3.8.6
> - BerespHeader Date: Tue, 29 Sep 2020 20:01:04 GMT
> - BerespHeader Content-type: application/octet-stream
> - BerespHeader Content-Length: 12
> - BerespHeader Last-Modified: Tue, 29 Sep 2020 19:36:31 GMT
> - TTL RFC 120 10 0 1601409664 1601409664 1601409664 0 0 cacheable
> - VCL_call BACKEND_RESPONSE
> - VCL_return deliver
> - Filters
> - Storage malloc s0
> - Fetch_Body 3 length stream
> - BackendClose 26 default close
> - Timestamp BerespBody: 1601409664.463236 0.001682 0.000120
> - Length 12
> - BereqAcct 155 0 155 199 12 211
> - End
>
> * << Request >> 3
> - Begin req 2 restart
> - Timestamp Start: 1601409664.461429 0.000037 0.000000
> - ReqStart ::1 55180 a0
> - ReqMethod GET
> - ReqURL /this_url_does_not_exist
> - ReqProtocol HTTP/1.1
> - ReqHeader Host: localhost:6081
> - ReqHeader User-Agent: curl/7.72.0
> - ReqHeader Accept: */*
> - ReqHeader X-Forwarded-For: ::1
> - ReqHeader X-Original-URL: /this_object_exists
This transaction replies to the client but starts with the wrong URL,
but we can find the correct one in the header above.
> - VCL_call RECV
> - ReqURL /this_object_exists
> - ReqUnset X-Original-URL: /this_object_exists
> - ReqUnset Host: localhost:6081
> - ReqHeader host: localhost:6081
> - VCL_return hash
> - VCL_call HASH
> - VCL_return lookup
> - VCL_call MISS
> - VCL_return fetch
> - Link bereq 4 fetch
> - Timestamp Fetch: 1601409664.463212 0.001821 0.001783
> - RespProtocol HTTP/1.0
> - RespStatus 200
> - RespReason OK
> - RespHeader Server: SimpleHTTP/0.6 Python/3.8.6
> - RespHeader Date: Tue, 29 Sep 2020 20:01:04 GMT
> - RespHeader Content-type: application/octet-stream
> - RespHeader Content-Length: 12
> - RespHeader Last-Modified: Tue, 29 Sep 2020 19:36:31 GMT
> - RespProtocol HTTP/1.1
> - RespHeader X-Varnish: 3
> - RespHeader Age: 0
> - RespHeader Via: 1.1 varnish (Varnish/6.5)
> - VCL_call DELIVER
> - VCL_return deliver
> - Timestamp Process: 1601409664.463225 0.001833 0.000012
> - Filters
> - RespHeader Accept-Ranges: bytes
> - RespHeader Connection: keep-alive
> - Timestamp Resp: 1601409664.463286 0.001894 0.000061
> - ReqAcct 96 0 96 299 12 311
> - End
>
> * << Session >> 1
> - Begin sess 0 HTTP/1
> - SessOpen ::1 55180 a0 ::1 6081 1601409664.461310 18
> - Link req 2 rxreq
> - SessClose REM_CLOSE 0.002
> - End
>
> As you can see from the FETCH block, what it effectively fetches from backend is not what varnishncsa logs.
> I think it has something to do with the order in which the restart affects the flow, since its not replacing the req.url object, just appending a new one. BACKEND FETCH uses the right URL object since it uses the last one, but varnishncsa logs the first it picked up from that restart.
>
> I've tried setting a blank req.url or unsetting in an attempt to avoid having several req.URL objects introducing problems, but it seems that property is readonly since I get errors like "Variable cannot be unset" or "Setting req.url to empty sting - FAILED".
>
> Unfortunately I have no other way to implement my abtest engine than by using the restart() order from varnish, but this logging issue is extremely affecting what I can see to debug my stuff.
>
> Any clue of what trick I could use to workaround this?
You can grab the right URL from the header, but you need to always set
that header, even when you don't use the restart feature. Simply
replace the URL pattern from the default format with the header
instead:
> varnishncsa -F '%h %l %u %t "%m http://%{Host}i%{X-Original-URL}i%q %H" %s %b "%{Referer}i" "%{User-agent}i"'
Based on the varnishncsa manual [1] it should be something like that.
Dridi
[1] https://varnish-cache.org/docs/6.5/reference/varnishncsa.html
More information about the varnish-misc
mailing list