Varnishd stops sending logs to VSM after a while

Cédric Jeanneret varnish at tengu.ch
Thu Mar 13 10:25:52 CET 2014


Hello,

On 03/13/2014 10:14 AM, Dridi Boukelmoune wrote:
> Hi,
> 
> On Thu, Mar 13, 2014 at 9:30 AM, Cédric Jeanneret <varnish at tengu.ch> wrote:
>> Hello,
>>
>> We have a small problem with 5 of our varnishes (on a total of 6…):
>> it seems every morning, varnishd stops sending stuff to the Shared
>> Memory[1], meaning we don't have any logs.
>>
>> The only thing I found in order to get logs back is to restart varnish,
>> but of course this isn't the best way to solve the problem…
>>
>> Here are some information:
>>
>> Version: varnishd (varnish-3.0.2 revision 55e70a4)
>> ** OS: Debian Squeeze
>> ** System memory: 7468Mo
>> ** CPU: dual E5645 @ 2.40GHz (Note: for those who knows about Amazon AWS
>> instances, it's an m1.large, instance-store AMI.)
>> ** Daemon options:
>> DAEMON_OPTS="-n <instance-name> \
>>              -u varnish -g varnish \
>>              -a :80 \
>>              -T localhost:6082 \
>>              -s malloc,5G \
>>              -f /etc/varnish/<configuration-file>.vcl \
>>              -S /path/to/secret \
>>              -p shm_reclen=65535 \
>> "
>>
>> ** Tasks around the time the logs stop: logrotate for varnishncsa logs,
>> with a varnishncsa restart. This shouldn't break varnishd log system,
>> and it worked fine for months…
>>
>> We didn't detect any problem with memory nor disk I/O during the outage.
>> This morning, it was the third time in a row we detected this issue.
>>
>> For what we know, neither the VCL nor daemon options were changed just
>> before the problem appears (well, VCL was changed, some backend
>> "routing" was updated, but nothing out of the ordinary stuff we do for
>> months now).
>>
>> Symptoms:
>> - /var/lib/varnish/<instance-name>/_.vsm isn't updated
>> - running varnishncsa or varnishlog from the shell doesn't show any log
>> entries
>> - restarting the varnishd service bring logs up again (we can see the
>> flow if we keep the varnishncsa up)
>>
>> An "lsof -p <varnish-pid>" shows this line:
>> varnishd 17948 varnish  DEL    REG      202,1              264837
>> /var/lib/varnish/<instance-name>/_.vsm
> 
> I suspect something "unrelated" deleted the log file, but varnishd is
> probably still logging as usual. Because it still holds a file
> descriptor of the log file.
> 
> It's the log-reading tools on the other hand that won't see the file
> on the filesystem once it's deleted (unless they already hold a fd).
> That's why it feels like varnishd is not logging anymore.
> 
> Dridi

Hmm, _.vsm is shown when we perform a simple "ls" in the directory…
Else, I would get some errors from either varnishncsa or varnishlog when
I start them (I tried that this morning, just to see what would happen)…

More over, I saw a small garbage being pushed to the VSM, some
unreadable stuff ending with a FREE. It wasn't a "normal" line as we can
see in the VSM, it was shorter (like 3-4 garbage,then FREE)…
At the same time, nothing appeared in varnishncsa nor log.

In order to debug that, I also run a curl on localhost, one request per
second hitting the varnish (with correct headers and so on so that it
falls on some existing URL). Result:
- content is served
- no log in varnishncsa
- no log in varnishlog
- nothing in VSM file

Cheers,

C.

> 
>> I'm not very comfortable with the "DEL" FD: when I do the same command
>> once logs flow, I get:
>> varnishd 22603 varnish  mem       REG      202,1 84934656     264058
>> /var/lib/varnish/<instance-name>/_.vsm
>>
>> It seems "something" is degrading the shared memory…
>>
>> Any help would be welcome, I'm a bit stuck with the investigations right
>> now :(.
>>
>> Cheers,
>>
>> C.
>>
>>
>> [1] https://www.varnish-cache.org/docs/trunk/reference/vsm.html
>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc



More information about the varnish-misc mailing list