Bug? Barage of hits leads to failure creating worker threads / stats tracking

John Adams jna at twitter.com
Sat Apr 11 01:30:27 CEST 2009


Something's very wrong here - we've never experienced this before.

Are you stating the server as root or as another user? Any ulimit or  
restrictions on # of file descriptors?

-j

On Apr 10, 2009, at 3:58 PM, Ray Barnes wrote:

> John,
>
> Thanks for the reply; as you can see my config is largely based on  
> the one you posted to this list in February (thanks!).
>
> I went back as you suggested and waited 90 seconds, while starting  
> it the same way.  Before running any tests, I went into the CLI and  
> viewed stats on the threads:
>
>          364  N worker threads
>          364  N worker threads created
>          782  N worker threads not created
>
> When this happens (started threads do not match the number  
> specified), varnish does really unpredictable things, i.e. it won't  
> take 300 connections from 'ab' and times out with the following  
> message:
>
> Benchmarking 98.124.141.3 (be patient)
> apr_poll: The timeout specified has expired (70007)
> Total of 52 requests completed
>
> I think the crux of my problem is figuring out why it won't start  
> more threads.  Being not-so-familiar with the internals of varnish,  
> I can't tell whether that's an OS problem or a varnish problem.   
> Hope that helps.
>
> -Ray
>
>
>
> On Fri, Apr 10, 2009 at 6:35 PM, John Adams <jna at twitter.com> wrote:
> It takes time to spawn threads. If you start the server with  
> hundreds of threads, they won't be ready for ~30-90 seconds.
>
> Maybe that's causing this issue?
>
> -j
>
> On Apr 10, 2009, at 3:12 PM, Ray Barnes wrote:
>
>> Hi all.  Note that everything herein is based only on a very lay  
>> knowledge of varnish, without being familiar with the internals of  
>> the code.
>>
>> In my quest to eek more performance out of Varnish, I've been  
>> testing under 2.0.4.  I have not seen much improvement over 2.0.3  
>> in the way it acts after receiving a bunch of hits all at one  
>> time.  I am invoking varnish like this:
>>
>> ulimit -n 131072
>> ulimit -l 82000
>> /usr/local/sbin/varnishd -a 98.124.141.3:80 -b 67.212.179.98:80 -T  
>> 98.124.141.3:6083 \
>>         -t 60 -w1440,3000,60 -u apache -g apache -p  
>> obj_workspace=16000 -p sess_workspace=262144 -p listen_depth=4096 \
>>         -p shm_workspace=64000 -p thread_pools=8 -p  
>> thread_pool_min=180 -p ping_interval=1 -p srcaddr_ttl=0 -s malloc,80M
>> As best I can tell, the problem I'm seeing is that it will not  
>> create the number of worker threads that I'm telling it to, as  
>> evidenced by the 'status' output within the CLI immediately after  
>> launch:
>>
>>          270  N worker threads
>>          285  N worker threads created
>> So if I launch 'ab' with 700 connections against varnish, it will  
>> not work right from the beginning, like so:
>>
>> [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/
>> This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $>  
>> apache-2.0
>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>> Copyright 2006 The Apache Software Foundation, http://www.apache.org/
>> Benchmarking 98.124.141.3 (be patient)
>> apr_socket_recv: Connection refused (111)
>> [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/
>> This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $>  
>> apache-2.0
>> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
>> Copyright 2006 The Apache Software Foundation, http://www.apache.org/
>> Benchmarking 98.124.141.3 (be patient)
>> apr_poll: The timeout specified has expired (70007)
>> Total of 147 requests completed
>> [root at mia ~]# telnet 98.124.141.3 80
>> Trying 98.124.141.3...
>> Connected to 98.124.141.3 (98.124.141.3).
>> Escape character is '^]'.
>> GET / HTTP/1.0
>> ^]
>> telnet> quit
>> Connection closed.
>> The above telnet command simply hung, presumably because there are  
>> still 700 sessions in CLOSE_WAIT state within the kernel, although  
>> that should not matter if varnish opened the number of worker  
>> threads it was supposed to.  Based on what I've seen, it would seem  
>> that varnish has some problem when you launch it with "too many"  
>> initial worker threads (although I'm having a hard time  
>> understanding why 1400ish is too many).  It seems to go crazy if  
>> you specify too many threads initially.  Again, that number should  
>> not be a problem for the machine in theory, as it's a multicore  
>> Xeon.  Platform is Linux 2.6 RHEL.  Any idea what's happening here?
>>
>> -Ray
>>
>> _______________________________________________
>> varnish-dev mailing list
>> varnish-dev at projects.linpro.no
>> http://projects.linpro.no/mailman/listinfo/varnish-dev
>
> ---
> John Adams
> Twitter Operations
> jna at twitter.com
> http://twitter.com/netik
>
>
>
>
>

---
John Adams
Twitter Operations
jna at twitter.com
http://twitter.com/netik




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20090410/77cb72b2/attachment-0002.html>


More information about the varnish-dev mailing list