Varnish restarting sporadically... losing entire cache...

Kristian Lyngstøl kristian at varnish-software.com
Fri Jun 25 04:51:27 CEST 2010


The log message you posted earlier with the assert error contains a
identity string that - amoung other things - reads i686 (as opposed to
amd64/x86_64 or similar). That's how I can tell. And this is exactly
why it was added to begin with :)

As for centos, that explains the logs. I must've mixed you up with
someone else because I could've sworn you said you were on Ubuntu. Oh
well. We've used centos as a test platform though, so nothing
fundamentally wrong with it as far as Varnish is concerned.

Let me know when/if you get a core dump.

- Kristian

2010/6/25, Ben Nowacky <bnowacky at competitorgroup.com>:
> NOpe, this is a dedicated server.. We're running CentOs... How do you know
> we're running 32-bit version? I had to compile from source on CentOS, so
> just grabbed the binaries from the site and did a build from them.  How are
> you guessing it's 32-bit?
>
> Definitely not familiar with analyzing core-dumps or even getting them to
> run... I'm not a sys-admin, just the guy stuck trying to get our servers
> ready for an onslaught of traffic coming next week that I know we can not
> handle right now....
> On Jun 24, 2010, at 7:35 PM, Kristian Lyngstøl wrote:
>
>> If it's not the vm you will have to turn on core dumps to figure it
>> out. That involves setting ulimit -c unlimited in the startup script
>> (or running it manually on the shell you start varnish from). You also
>> likely want to set /proc/sys/vm/core_pattern to a path where you can
>> both fit the core dump and actually find it. If you're unfamiliar with
>> analyzing core dumps, you can gzip it and send it to me along with
>> your varnish binaries, if you want to.
>>
>> As for logging, I suppose it might have changed in Ubuntu. I'll have
>> to check that. You got the assert error though, so it's all there.
>>
>> Just out of curiosity though: why 32-bit? Is it by any chance a
>> virtual machine, or similar?
>>
>> -Kristian
>> PS: I'm not on a computer right now, so you will want to verify the
>> ulimit argument-name and core_pattern path.
>>
>> 2010/6/25, Ben Nowacky <bnowacky at competitorgroup.com>:
>>> Thanks Kristian! Been reading your blog, and got some of these from your
>>> site... Guess I went overboard with some of them...
>>>
>>> - Ther is no /var/log/syslog so nothing else is being logged. This is the
>>> only location i've been able to get any debug info out of varnish. We're
>>> not
>>> tapping out VM or anything else it appears though.. Everything looks okay
>>> on
>>> that front, but I'm going to lower the max threads and see how that takes
>>> us.. maybe it'll be a simple solution.
>>>
>>> Appreciate the help!
>>> On Jun 24, 2010, at 7:00 PM, Kristian Lyngstøl wrote:
>>>
>>>> As Per says, it's likely you run out of vm space. You are also
>>>> specifying a great deal of parameters which I suspect are not actually
>>>> adjusted to your site. I would not recommend half of them unless you
>>>> actually know why.
>>>>
>>>> It looks like your log entries are from /var/log/messages. You will
>>>> likely find more in /var/log/syslog on Ubuntu.
>>>>
>>>> Also: 5000 threads is going to be far too many on a 32-bit system.
>>>> Using 64-bit is by far the simplest way to avoid hassel. If you insist
>>>> on 32-bit, you will need to reduce the maximum amount of threads, and
>>>> possibly adjust the stack size, though newer varnish packages might
>>>> try to do the latter. At any rate, closely monitor vm-usage.
>>>>
>>>> Also, signal 11 is a segfault. This means invalid or illegal memory
>>>> access, which could match the symptoms of a 32-bit
>>>> varnish-installation running out of virtual memory address space.
>>>>
>>>> - Kristian
>>>>
>>>> 2010/6/25, Ben Nowacky <bnowacky at competitorgroup.com>:
>>>>> Here's the error I get consistently:
>>>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21427) died signal=11
>>>>> Jun 24 23:35:31 srv860 varnishd[20605]: child (21660) Started
>>>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21660) said
>>>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21660) said Child starts
>>>>>
>>>>> Here's my config:
>>>>> "-f /usr/local/varnish-2.1.2/etc/default.vcl \
>>>>> 	     -s malloc,1G \
>>>>> 	     -p thread_pool_max=5000 \
>>>>> 	     -p thread_pools=4 \
>>>>> 	     -p thread_pool_min=200 \
>>>>> 	     -p thread_pool_add_delay=1ms \
>>>>> 	     -p cli_timeout=1000s \
>>>>> 	     -p ping_interval=1 \
>>>>> 	     -p cli_buffer=16384 \
>>>>> 	     -p session_linger=20ms \
>>>>> 	     -p lru_interval=360s \
>>>>> 	     -p listen_depth=8192 \
>>>>>        -h classic,500009 \
>>>>> 	     -T localhost:2000 "
>>>>>
>>>>> Am I doing anything in here atrocious that would be causing the random
>>>>> resets? I've tried file and malloc storage to no avail.. Neither one
>>>>> fixed
>>>>> the issue. I've tried adjusting sess_timeout, sess_workspace, etc...
>>>>> also
>>>>> nothing..  Changed the hash from classic to critbit also, with no
>>>>> success.
>>>>> Bashing head against the wall, if anyone has any advice could really
>>>>> use
>>>>> it
>>>>> ! !
>>>>>
>>>>>
>>>>> On Jun 24, 2010, at 10:58 AM, Caunter, Stefan wrote:
>>>>>
>>>>>> Check dmesg too, child is probably dying. Problem with persistent I
>>>>>> found, I had to go back to file.
>>>>>>
>>>>>> Stefan Caunter :: Senior Systems Administrator :: TOPS
>>>>>> e: scaunter at topscms.com  ::  m: (416) 561-4871
>>>>>> www.thestar.com www.topscms.com
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: varnish-misc-bounces at varnish-cache.org
>>>>>> [mailto:varnish-misc-bounces at varnish-cache.org] On Behalf Of Ben
>>>>>> Nowacky
>>>>>> Sent: June-24-10 1:51 PM
>>>>>> To: Flavio Torres
>>>>>> Cc: varnish-misc at varnish-cache.org
>>>>>> Subject: Re: Varnish restarting sporadically... losing entire cache...
>>>>>>
>>>>>> Thanks Flavio! Here's the errors that I see in the
>>>>>> /var/log/messages...
>>>>>> Is this what you were seeing?
>>>>>>
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22165) Panic message:
>>>>>> Assert error in SMP_FreeObj(), storage_persistent.c line 802:
>>>>>> Condition(sg->nfixed > 0) not true. thread = (cache-timeout) ident =
>>>>>> Linux,2.6.18-128.4.1.el5PAE,i686,-spersistent,-hclassic,epoll
>>>>>> Backtrace:
>>>>>> 0x806ca7c: pan_ic+cc   0x808851e: SMP_FreeObj+13e   0x8064b5f:
>>>>>> HSH_Deref+21f   0x80618d1: exp_timer+321   0x806f1fd: wrk_bgthread+cd
>>>>>> 0x44249b: /lib/libpthread.so.0 [0x44249b]   0x39942e:
>>>>>> /lib/libc.so.6(clone+0x5e) [0x39942e]
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: child (22984) Started
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Child
>>>>>> starts
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Dropped 0
>>>>>> segments to make free_reserve
>>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Silo
>>>>>> completely loaded
>>>>>> On Jun 24, 2010, at 10:51 AM, Flavio Torres wrote:
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> varnish-misc mailing list
>>>>> varnish-misc at varnish-cache.org
>>>>> http://lists.varnish-cache.org/mailman/listinfo/varnish-misc
>>>>>
>>>
>>>
>
>




More information about the varnish-misc mailing list