Apparent bug in exp_prefetch()

Adrian Otto aotto at mosso.com
Sat Jan 26 10:11:34 CET 2008


Hello,

I'm new to varnish but I think I've found a bug and I'm seeking help  
to confirm it. This is varnish 1.1.2 on RHEL4. I'm trying to use this  
in my VCL:

sub vcl_timeout {
	fetch;
}

This is in contrast to the default setting:

sub vcl_timeout {
	discard;
}

According to the man page, this is supposed to fetch a fresh copy of  
the document as it's about to expire:

      vcl_timeout
              Called by the reaper thread shortly before a cached  
document reaches its expiry time.

              The vcl_timeout subroutine may terminate with one of  
the following keywords:

              fetch   Request a fresh copy of the object from the  
backend.

              discard
                      Discard the object.

What I want is a way to always have a fresh copy of a given document  
cached, so that no client connections actually have to fall through  
to "pass" to fetch the document from the backend.

Now, I looked into the source, and tracked that capability back to  
the file cache_expire.c and function exp_prefetch. This is where I  
found it using GDB:

(gdb) info threads
   6 Thread -1217016912 (LWP 17397)  0x002777a2 in _dl_sysinfo_int80  
() from /lib/ld-linux.so.2
   5 Thread -1227506768 (LWP 17398)  0x002777a2 in _dl_sysinfo_int80  
() from /lib/ld-linux.so.2
   4 Thread -1237996624 (LWP 17399)  0x002777a2 in _dl_sysinfo_int80  
() from /lib/ld-linux.so.2
   3 Thread 1972149168 (LWP 17400)  0x002777a2 in _dl_sysinfo_int80  
() from /lib/ld-linux.so.2
   2 Thread 1961659312 (LWP 17401)  0x002777a2 in _dl_sysinfo_int80  
() from /lib/ld-linux.so.2
   1 Thread -1208621376 (LWP 17396)  0x002777a2 in _dl_sysinfo_int80  
() from /lib/ld-linux.so.2
(gdb) thread 5
[Switching to thread 5 (Thread -1227506768 (LWP 17398))]#0   
0x002777a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x002777a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x0031bce6 in __nanosleep_nocancel () from /lib/tls/libc.so.6
#2  0x0031baac in sleep () from /lib/tls/libc.so.6
#3  0x080516d2 in exp_prefetch (arg=0x0) at cache_expire.c:178
#4  0x004f33b8 in start_thread () from /lib/tls/libpthread.so.0
#5  0x0035a2fe in clone () from /lib/tls/libc.so.6

It's always running in thread 5. Looking carefully at the source, I  
only see how an object could be removed from the heap. I don't see  
how it could actually get replaced. It only seems to know how to  
expire something. It seems that line 183 runs, and takes it out of  
the binheap, and then the lock is released. After that happens, it  
follows what seemed to be a rather confusing sequence of code under  
VCL_timeout_method(sp). I have not yet learned how the VCL actually  
controls the internals of varnish, so I'm not pretending to fathom  
it. I think what might work nicely is a code sequence like this:

A) Find an object that's expired.
B) Fetch a new copy if it was modified since the last fetch, and  
verify it.
C) Lock the binheap.
D) Drop the old object, and swap in the new one.
E) Unlock the binheap.

It might also be nice if the 'varnishlog' process could have  
visibility into the progress of the above code sequence as well. Am I  
missing something, or is there a bug here that makes the 'fetch'  
method in 'vcl_timeout' impossible to use?

Here are steps to reproduce the problem:

1) Redefine your vcl_timeout like this:

sub vcl_timeout {
	fetch;
}

2) Start varnish, and varnishlog.

3) Browse a page that has a short TTL, like 120 seconds.

4) Wait for the TTL to expire.

At this point I expect to see the varnishlog output indicate that  
it's fetching a fresh copy, but I don't.

5) Browse to the same page again, and you'll notice that it re- 
fetches the document from the origin at the time your browser  
requests the file via 'miss', not before the miss occured.

I look forward to any guidance you can provide.

Thanks,

Adrian



More information about the varnish-dev mailing list