Varnish restarting sporadically... losing entire cache...

Kristian Lyngstøl kristian at varnish-software.com
Fri Jun 25 04:35:23 CEST 2010


If it's not the vm you will have to turn on core dumps to figure it
out. That involves setting ulimit -c unlimited in the startup script
(or running it manually on the shell you start varnish from). You also
likely want to set /proc/sys/vm/core_pattern to a path where you can
both fit the core dump and actually find it. If you're unfamiliar with
analyzing core dumps, you can gzip it and send it to me along with
your varnish binaries, if you want to.

As for logging, I suppose it might have changed in Ubuntu. I'll have
to check that. You got the assert error though, so it's all there.

Just out of curiosity though: why 32-bit? Is it by any chance a
virtual machine, or similar?

-Kristian
PS: I'm not on a computer right now, so you will want to verify the
ulimit argument-name and core_pattern path.

2010/6/25, Ben Nowacky <bnowacky at competitorgroup.com>:
> Thanks Kristian! Been reading your blog, and got some of these from your
> site... Guess I went overboard with some of them...
>
> - Ther is no /var/log/syslog so nothing else is being logged. This is the
> only location i've been able to get any debug info out of varnish. We're not
> tapping out VM or anything else it appears though.. Everything looks okay on
> that front, but I'm going to lower the max threads and see how that takes
> us.. maybe it'll be a simple solution.
>
> Appreciate the help!
> On Jun 24, 2010, at 7:00 PM, Kristian Lyngstøl wrote:
>
>> As Per says, it's likely you run out of vm space. You are also
>> specifying a great deal of parameters which I suspect are not actually
>> adjusted to your site. I would not recommend half of them unless you
>> actually know why.
>>
>> It looks like your log entries are from /var/log/messages. You will
>> likely find more in /var/log/syslog on Ubuntu.
>>
>> Also: 5000 threads is going to be far too many on a 32-bit system.
>> Using 64-bit is by far the simplest way to avoid hassel. If you insist
>> on 32-bit, you will need to reduce the maximum amount of threads, and
>> possibly adjust the stack size, though newer varnish packages might
>> try to do the latter. At any rate, closely monitor vm-usage.
>>
>> Also, signal 11 is a segfault. This means invalid or illegal memory
>> access, which could match the symptoms of a 32-bit
>> varnish-installation running out of virtual memory address space.
>>
>> - Kristian
>>
>> 2010/6/25, Ben Nowacky <bnowacky at competitorgroup.com>:
>>> Here's the error I get consistently:
>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21427) died signal=11
>>> Jun 24 23:35:31 srv860 varnishd[20605]: child (21660) Started
>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21660) said
>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21660) said Child starts
>>>
>>> Here's my config:
>>> "-f /usr/local/varnish-2.1.2/etc/default.vcl \
>>> 	     -s malloc,1G \
>>> 	     -p thread_pool_max=5000 \
>>> 	     -p thread_pools=4 \
>>> 	     -p thread_pool_min=200 \
>>> 	     -p thread_pool_add_delay=1ms \
>>> 	     -p cli_timeout=1000s \
>>> 	     -p ping_interval=1 \
>>> 	     -p cli_buffer=16384 \
>>> 	     -p session_linger=20ms \
>>> 	     -p lru_interval=360s \
>>> 	     -p listen_depth=8192 \
>>>         -h classic,500009 \
>>> 	     -T localhost:2000 "
>>>
>>> Am I doing anything in here atrocious that would be causing the random
>>> resets? I've tried file and malloc storage to no avail.. Neither one
>>> fixed
>>> the issue. I've tried adjusting sess_timeout, sess_workspace, etc... also
>>> nothing..  Changed the hash from classic to critbit also, with no
>>> success.
>>> Bashing head against the wall, if anyone has any advice could really use
>>> it
>>> ! !
>>>
>>>
>>> On Jun 24, 2010, at 10:58 AM, Caunter, Stefan wrote:
>>>
>>>> Check dmesg too, child is probably dying. Problem with persistent I
>>>> found, I had to go back to file.
>>>>
>>>> Stefan Caunter :: Senior Systems Administrator :: TOPS
>>>> e: scaunter at topscms.com  ::  m: (416) 561-4871
>>>> www.thestar.com www.topscms.com
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: varnish-misc-bounces at varnish-cache.org
>>>> [mailto:varnish-misc-bounces at varnish-cache.org] On Behalf Of Ben Nowacky
>>>> Sent: June-24-10 1:51 PM
>>>> To: Flavio Torres
>>>> Cc: varnish-misc at varnish-cache.org
>>>> Subject: Re: Varnish restarting sporadically... losing entire cache...
>>>>
>>>> Thanks Flavio! Here's the errors that I see in the /var/log/messages...
>>>> Is this what you were seeing?
>>>>
>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22165) Panic message:
>>>> Assert error in SMP_FreeObj(), storage_persistent.c line 802:
>>>> Condition(sg->nfixed > 0) not true. thread = (cache-timeout) ident =
>>>> Linux,2.6.18-128.4.1.el5PAE,i686,-spersistent,-hclassic,epoll Backtrace:
>>>> 0x806ca7c: pan_ic+cc   0x808851e: SMP_FreeObj+13e   0x8064b5f:
>>>> HSH_Deref+21f   0x80618d1: exp_timer+321   0x806f1fd: wrk_bgthread+cd
>>>> 0x44249b: /lib/libpthread.so.0 [0x44249b]   0x39942e:
>>>> /lib/libc.so.6(clone+0x5e) [0x39942e]
>>>> Jun 24 17:38:23 srv860 varnishd[15625]: child (22984) Started
>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said
>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Child starts
>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Dropped 0
>>>> segments to make free_reserve
>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Silo
>>>> completely loaded
>>>> On Jun 24, 2010, at 10:51 AM, Flavio Torres wrote:
>>>>
>>>
>>>
>>> _______________________________________________
>>> varnish-misc mailing list
>>> varnish-misc at varnish-cache.org
>>> http://lists.varnish-cache.org/mailman/listinfo/varnish-misc
>>>
>
>




More information about the varnish-misc mailing list