Varnish restarting sporadically... losing entire cache...

Ben Nowacky bnowacky at competitorgroup.com
Fri Jun 25 04:39:22 CEST 2010


NOpe, this is a dedicated server.. We're running CentOs... How do you know we're running 32-bit version? I had to compile from source on CentOS, so just grabbed the binaries from the site and did a build from them.  How are you guessing it's 32-bit? 

Definitely not familiar with analyzing core-dumps or even getting them to run... I'm not a sys-admin, just the guy stuck trying to get our servers ready for an onslaught of traffic coming next week that I know we can not handle right now.... 
On Jun 24, 2010, at 7:35 PM, Kristian Lyngstøl wrote:

> If it's not the vm you will have to turn on core dumps to figure it
> out. That involves setting ulimit -c unlimited in the startup script
> (or running it manually on the shell you start varnish from). You also
> likely want to set /proc/sys/vm/core_pattern to a path where you can
> both fit the core dump and actually find it. If you're unfamiliar with
> analyzing core dumps, you can gzip it and send it to me along with
> your varnish binaries, if you want to.
> 
> As for logging, I suppose it might have changed in Ubuntu. I'll have
> to check that. You got the assert error though, so it's all there.
> 
> Just out of curiosity though: why 32-bit? Is it by any chance a
> virtual machine, or similar?
> 
> -Kristian
> PS: I'm not on a computer right now, so you will want to verify the
> ulimit argument-name and core_pattern path.
> 
> 2010/6/25, Ben Nowacky <bnowacky at competitorgroup.com>:
>> Thanks Kristian! Been reading your blog, and got some of these from your
>> site... Guess I went overboard with some of them...
>> 
>> - Ther is no /var/log/syslog so nothing else is being logged. This is the
>> only location i've been able to get any debug info out of varnish. We're not
>> tapping out VM or anything else it appears though.. Everything looks okay on
>> that front, but I'm going to lower the max threads and see how that takes
>> us.. maybe it'll be a simple solution.
>> 
>> Appreciate the help!
>> On Jun 24, 2010, at 7:00 PM, Kristian Lyngstøl wrote:
>> 
>>> As Per says, it's likely you run out of vm space. You are also
>>> specifying a great deal of parameters which I suspect are not actually
>>> adjusted to your site. I would not recommend half of them unless you
>>> actually know why.
>>> 
>>> It looks like your log entries are from /var/log/messages. You will
>>> likely find more in /var/log/syslog on Ubuntu.
>>> 
>>> Also: 5000 threads is going to be far too many on a 32-bit system.
>>> Using 64-bit is by far the simplest way to avoid hassel. If you insist
>>> on 32-bit, you will need to reduce the maximum amount of threads, and
>>> possibly adjust the stack size, though newer varnish packages might
>>> try to do the latter. At any rate, closely monitor vm-usage.
>>> 
>>> Also, signal 11 is a segfault. This means invalid or illegal memory
>>> access, which could match the symptoms of a 32-bit
>>> varnish-installation running out of virtual memory address space.
>>> 
>>> - Kristian
>>> 
>>> 2010/6/25, Ben Nowacky <bnowacky at competitorgroup.com>:
>>>> Here's the error I get consistently:
>>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21427) died signal=11
>>>> Jun 24 23:35:31 srv860 varnishd[20605]: child (21660) Started
>>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21660) said
>>>> Jun 24 23:35:31 srv860 varnishd[20605]: Child (21660) said Child starts
>>>> 
>>>> Here's my config:
>>>> "-f /usr/local/varnish-2.1.2/etc/default.vcl \
>>>> 	     -s malloc,1G \
>>>> 	     -p thread_pool_max=5000 \
>>>> 	     -p thread_pools=4 \
>>>> 	     -p thread_pool_min=200 \
>>>> 	     -p thread_pool_add_delay=1ms \
>>>> 	     -p cli_timeout=1000s \
>>>> 	     -p ping_interval=1 \
>>>> 	     -p cli_buffer=16384 \
>>>> 	     -p session_linger=20ms \
>>>> 	     -p lru_interval=360s \
>>>> 	     -p listen_depth=8192 \
>>>>        -h classic,500009 \
>>>> 	     -T localhost:2000 "
>>>> 
>>>> Am I doing anything in here atrocious that would be causing the random
>>>> resets? I've tried file and malloc storage to no avail.. Neither one
>>>> fixed
>>>> the issue. I've tried adjusting sess_timeout, sess_workspace, etc... also
>>>> nothing..  Changed the hash from classic to critbit also, with no
>>>> success.
>>>> Bashing head against the wall, if anyone has any advice could really use
>>>> it
>>>> ! !
>>>> 
>>>> 
>>>> On Jun 24, 2010, at 10:58 AM, Caunter, Stefan wrote:
>>>> 
>>>>> Check dmesg too, child is probably dying. Problem with persistent I
>>>>> found, I had to go back to file.
>>>>> 
>>>>> Stefan Caunter :: Senior Systems Administrator :: TOPS
>>>>> e: scaunter at topscms.com  ::  m: (416) 561-4871
>>>>> www.thestar.com www.topscms.com
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: varnish-misc-bounces at varnish-cache.org
>>>>> [mailto:varnish-misc-bounces at varnish-cache.org] On Behalf Of Ben Nowacky
>>>>> Sent: June-24-10 1:51 PM
>>>>> To: Flavio Torres
>>>>> Cc: varnish-misc at varnish-cache.org
>>>>> Subject: Re: Varnish restarting sporadically... losing entire cache...
>>>>> 
>>>>> Thanks Flavio! Here's the errors that I see in the /var/log/messages...
>>>>> Is this what you were seeing?
>>>>> 
>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22165) Panic message:
>>>>> Assert error in SMP_FreeObj(), storage_persistent.c line 802:
>>>>> Condition(sg->nfixed > 0) not true. thread = (cache-timeout) ident =
>>>>> Linux,2.6.18-128.4.1.el5PAE,i686,-spersistent,-hclassic,epoll Backtrace:
>>>>> 0x806ca7c: pan_ic+cc   0x808851e: SMP_FreeObj+13e   0x8064b5f:
>>>>> HSH_Deref+21f   0x80618d1: exp_timer+321   0x806f1fd: wrk_bgthread+cd
>>>>> 0x44249b: /lib/libpthread.so.0 [0x44249b]   0x39942e:
>>>>> /lib/libc.so.6(clone+0x5e) [0x39942e]
>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: child (22984) Started
>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said
>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Child starts
>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Dropped 0
>>>>> segments to make free_reserve
>>>>> Jun 24 17:38:23 srv860 varnishd[15625]: Child (22984) said Silo
>>>>> completely loaded
>>>>> On Jun 24, 2010, at 10:51 AM, Flavio Torres wrote:
>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> varnish-misc mailing list
>>>> varnish-misc at varnish-cache.org
>>>> http://lists.varnish-cache.org/mailman/listinfo/varnish-misc
>>>> 
>> 
>> 





More information about the varnish-misc mailing list