Bug? Barage of hits leads to failure creating worker threads / stats tracking

Ray Barnes tical.net at gmail.com
Sat Apr 11 00:58:01 CEST 2009


John,

Thanks for the reply; as you can see my config is largely based on the one
you posted to this list in February (thanks!).

I went back as you suggested and waited 90 seconds, while starting it the
same way.  Before running any tests, I went into the CLI and viewed stats on
the threads:

         364  N worker threads
         364  N worker threads created
         782  N worker threads not created

When this happens (started threads do not match the number specified),
varnish does really unpredictable things, i.e. it won't take 300 connections
from 'ab' and times out with the following message:

Benchmarking 98.124.141.3 (be patient)
apr_poll: The timeout specified has expired (70007)
Total of 52 requests completed

I think the crux of my problem is figuring out why it won't start more
threads.  Being not-so-familiar with the internals of varnish, I can't tell
whether that's an OS problem or a varnish problem.  Hope that helps.

-Ray



On Fri, Apr 10, 2009 at 6:35 PM, John Adams <jna at twitter.com> wrote:

> It takes time to spawn threads. If you start the server with hundreds of
> threads, they won't be ready for ~30-90 seconds.
> Maybe that's causing this issue?
>
> -j
>
>   On Apr 10, 2009, at 3:12 PM, Ray Barnes wrote:
>
>   Hi all.  Note that everything herein is based only on a very lay
> knowledge of varnish, without being familiar with the internals of the code.
>
> In my quest to eek more performance out of Varnish, I've been testing under
> 2.0.4.  I have not seen much improvement over 2.0.3 in the way it acts after
> receiving a bunch of hits all at one time.  I am invoking varnish like this:
>
> ulimit -n 131072
> ulimit -l 82000
> /usr/local/sbin/varnishd -a 98.124.141.3:80 <http://98.124.141.3/> -b
> 67.212.179.98:80 <http://67.212.179.98/> -T 98.124.141.3:6083 \
>         -t 60 -w1440,3000,60 -u apache -g apache -p obj_workspace=16000 -p
> sess_workspace=262144 -p listen_depth=4096 \
>         -p shm_workspace=64000 -p thread_pools=8 -p thread_pool_min=180 -p
> ping_interval=1 -p srcaddr_ttl=0 -s malloc,80M
> As best I can tell, the problem I'm seeing is that it will not create the
> number of worker threads that I'm telling it to, as evidenced by the
> 'status' output within the CLI immediately after launch:
>
>          270  N worker threads
>          285  N worker threads created
> So if I launch 'ab' with 700 connections against varnish, it will not work
> right from the beginning, like so:
>
> [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/
> This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Copyright 2006 The Apache Software Foundation, http://www.apache.org/
> Benchmarking 98.124.141.3 (be patient)
> apr_socket_recv: Connection refused (111)
> [root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/
> This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Copyright 2006 The Apache Software Foundation, http://www.apache.org/
> Benchmarking 98.124.141.3 (be patient)
> apr_poll: The timeout specified has expired (70007)
> Total of 147 requests completed
> [root at mia ~]# telnet 98.124.141.3 80
> Trying 98.124.141.3...
> Connected to 98.124.141.3 (98.124.141.3).
> Escape character is '^]'.
> GET / HTTP/1.0
> ^]
> telnet> quit
> Connection closed.
> The above telnet command simply hung, presumably because there are still
> 700 sessions in CLOSE_WAIT state within the kernel, although that should not
> matter if varnish opened the number of worker threads it was supposed to.
> Based on what I've seen, it would seem that varnish has some problem when
> you launch it with "too many" initial worker threads (although I'm having a
> hard time understanding why 1400ish is too many).  It seems to go crazy if
> you specify too many threads initially.  Again, that number should not be a
> problem for the machine in theory, as it's a multicore Xeon.  Platform is
> Linux 2.6 RHEL.  Any idea what's happening here?
>
> -Ray
>
> _______________________________________________
> varnish-dev mailing list
> varnish-dev at projects.linpro.no
> http://projects.linpro.no/mailman/listinfo/varnish-dev
>
>
>   ---
> John Adams
> Twitter Operations
> jna at twitter.com
> http://twitter.com/netik
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20090410/a490c4c2/attachment-0001.html>


More information about the varnish-dev mailing list