Unexplained Cache MISSes

Guillaume Quintard guillaume at varnish-software.com
Fri Jun 23 11:09:50 CEST 2017


Hum, could you toy with ttl/grace/keep periods? Like having only a one week
TTL but no grace/keep, then a one week grace but no TTL/keep?
The period when the purge occurs may be important...

-- 
Guillaume Quintard

On Fri, Jun 16, 2017 at 9:09 PM, Nigel Peck <np.lists at sharphosting.uk>
wrote:

>
> Here's an interesting thing about this. When I refreshed the cache just
> now (PURGE) for 204 URLs, 78 of them were a HIT instead of a MISS. All had
> been in the cache for 9 hours at least. (a re-issued GET request received a
> MISS for all 78)
>
> When I immediately issued a PURGE again a few seconds later for all 204
> URLs, every one of them was a MISS and purged successfully. I did it again
> a few seconds after that, and again all good. Same again a few minutes
> after that. No HITs.
>
> So this seems to be in some way related to how long the objects have been
> in the cache.
>
> Nigel
>
>
> On 16/06/2017 13:27, Nigel Peck wrote:
>
>>
>> Sorry for the delay on working on this. I've read your email a few times
>> now and am still confused! I need to read the man pages suggested but
>> haven't got to it yet. Will let you know when I make some progress on it.
>>
>> I'm fixing the issue in the interim here by issuing another GET request
>> in my cache refresh scripts for any PURGE requests that come back with a
>> HIT.
>>
>> Nigel
>>
>> On 02/06/2017 18:08, Dridi Boukelmoune wrote:
>>
>>> Amazingly enough I never looked at the logs of a purge, maybe ExpKill
>>>>> could give us a VXID to then check against the hit. If only
>>>>> SomeoneElse(tm)
>>>>> could spare me the time and look at it themselves and tell us (wink
>>>>> wink=).
>>>>>
>>>>
>>>>
>>>> I'm very happy to help in any way I can. Please let me know anything I
>>>> can
>>>> do or information I can provide. I'm no C programmer (web
>>>> developer/server
>>>> admin), so can't help out with coding/patching/debugging[3], but
>>>> anything
>>>> else I can do, please let me know what you need.
>>>>
>>>
>>> Well, luckily I didn't write any C code to find out what purge logs
>>> look like. I'm certainly not going to debug code I'm not familiar with ;)
>>>
>>> I wrote a dummy test case instead:
>>>
>>>      varnishtest "purge logs"
>>>
>>>      server s1 {
>>>          rxreq
>>>          expect req.url == "/to-be-purged"
>>>          txresp
>>>      } -start
>>>
>>>      varnish v1 -vcl+backend {
>>>          sub vcl_recv {
>>>              if (req.method == "PURGE") {
>>>                  return (purge);
>>>              }
>>>          }
>>>      } -start
>>>
>>>      client c1 {
>>>          txreq -url "/to-be-purged"
>>>          rxresp
>>>
>>>          txreq -req PURGE -url "/to-be-purged"
>>>          rxresp
>>>
>>>          txreq -req PURGE -url "/unknown"
>>>          rxresp
>>>      } -run
>>>
>>> And then looked at the logs manually:
>>>
>>>      varnishtest test.vtc -v | grep vsl | less
>>>
>>> Here's a sample:
>>>
>>>      [...]
>>>      **** v1    0.4 vsl|       1002 VCL_return      b deliver
>>>      **** v1    0.4 vsl|       1002 Storage         b malloc s0
>>>      [...]
>>>      **** v1    0.4 vsl|          0 ExpKill         - EXP_When
>>> p=0x7f420b027000 e=1496443420.703764200 f=0xe
>>>      **** v1    0.4 vsl|          0 ExpKill         - EXP_expire
>>> p=0x7f420b027000 e=-0.000092268 f=0x0
>>>      **** v1    0.4 vsl|          0 ExpKill         - EXP_Expired x=1002
>>> t=-0
>>>      [...]
>>>      **** v1    0.4 vsl|       1003 ReqMethod       c PURGE
>>>      **** v1    0.4 vsl|       1003 ReqURL          c /to-be-purged
>>>      [...]
>>>      **** v1    0.4 vsl|       1003 VCL_return      c purge
>>>      **** v1    0.4 vsl|       1003 VCL_call        c HASH
>>>      **** v1    0.4 vsl|       1003 VCL_return      c lookup
>>>      **** v1    0.4 vsl|       1003 VCL_call        c PURGE
>>>      **** v1    0.4 vsl|       1003 VCL_return      c synth
>>>      [...]
>>>      **** v1    0.4 vsl|       1004 ReqMethod       c PURGE
>>>      **** v1    0.4 vsl|       1004 ReqURL          c /unknown
>>>      [...]
>>>      **** v1    0.4 vsl|       1004 VCL_return      c purge
>>>      **** v1    0.4 vsl|       1004 VCL_call        c HASH
>>>      **** v1    0.4 vsl|       1004 VCL_return      c lookup
>>>      **** v1    0.4 vsl|       1004 VCL_call        c PURGE
>>>      **** v1    0.4 vsl|       1004 VCL_return      c synth
>>>      [...]
>>>
>>> The interesting transaction id (VXID) is 1002.
>>>
>>> So 1) purge-related logs will only show up with raw grouping in
>>> varnishlog (which I find unfortunate but I should have remembered the
>>> expiry thread would have been involved) and 2) we don't see in a
>>> transaction log how many objects were actually purged (moved to the
>>> expiry inbox).
>>>
>>> The ExpKill records appear before because transactions commit their
>>> logs when they finish by default.
>>>
>>> Would a cleanly installed server and absolute minimum VCL to reproduce
>>>> this
>>>> be useful? You would be welcome to have access to that server, if
>>>> useful,
>>>> once I've got it set up and producing the same problem.
>>>>
>>>
>>> Not yet, at this point we know that we were looking at an incomplete
>>> picture so what you need to do is capture raw logs and we will be able
>>> to get both a VXID and a timestamp from the ExpKill records (although
>>> the timestamp for EXP_expire puzzles me).
>>>
>>> See man varnishlog to see how to write (-w) and then read (-r) logs
>>> to/from a file. When you notice the alleged bug, note the transaction
>>> id and write the current logs (with the -d option) so that you can
>>> pick up all the interesting bits at rest (instead of doing it on live
>>> traffic).
>>>
>>> I can say that in my case there is definitely no Age header coming from
>>>> the
>>>> back-end. Also as shown in the example I sent it is the 7th HIT on that
>>>> object.
>>>>
>>>
>>> Yes, smells like a bug. But before capturing logs, make sure to remove
>>> Hash records from the vsl_mask (man varnishd) so we can confirm what's
>>> being purged too.
>>>
>>> I have a theory, a long shot that will only prove how unfamiliar I am
>>> with this part of Varnish. Since the purge moves the object to the
>>> expiry inbox, it could be that under load the restart may happen
>>> before the expiry thread marks it as expired, thus creating a race
>>> with the next lookup.
>>>
>>> Cheers,
>>> Dridi
>>>
>>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc at varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20170623/10f17d80/attachment.html>


More information about the varnish-misc mailing list