Bug? Barage of hits leads to failure creating worker threads / stats tracking

Ray Barnes tical.net at gmail.com
Sat Apr 11 00:12:48 CEST 2009


Hi all.  Note that everything herein is based only on a very lay knowledge
of varnish, without being familiar with the internals of the code.

In my quest to eek more performance out of Varnish, I've been testing under
2.0.4.  I have not seen much improvement over 2.0.3 in the way it acts after
receiving a bunch of hits all at one time.  I am invoking varnish like this:

ulimit -n 131072
ulimit -l 82000
/usr/local/sbin/varnishd -a 98.124.141.3:80 -b 67.212.179.98:80 -T
98.124.141.3:6083 \
        -t 60 -w1440,3000,60 -u apache -g apache -p obj_workspace=16000 -p
sess_workspace=262144 -p listen_depth=4096 \
        -p shm_workspace=64000 -p thread_pools=8 -p thread_pool_min=180 -p
ping_interval=1 -p srcaddr_ttl=0 -s malloc,80M
As best I can tell, the problem I'm seeing is that it will not create the
number of worker threads that I'm telling it to, as evidenced by the
'status' output within the CLI immediately after launch:

         270  N worker threads
         285  N worker threads created
So if I launch 'ab' with 700 connections against varnish, it will not work
right from the beginning, like so:

[root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking 98.124.141.3 (be patient)
apr_socket_recv: Connection refused (111)
[root at mia ~]# ab -n 20000 -c 700 http://98.124.141.3/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/
Benchmarking 98.124.141.3 (be patient)
apr_poll: The timeout specified has expired (70007)
Total of 147 requests completed
[root at mia ~]# telnet 98.124.141.3 80
Trying 98.124.141.3...
Connected to 98.124.141.3 (98.124.141.3).
Escape character is '^]'.
GET / HTTP/1.0
^]
telnet> quit
Connection closed.
The above telnet command simply hung, presumably because there are still 700
sessions in CLOSE_WAIT state within the kernel, although that should not
matter if varnish opened the number of worker threads it was supposed to.
Based on what I've seen, it would seem that varnish has some problem when
you launch it with "too many" initial worker threads (although I'm having a
hard time understanding why 1400ish is too many).  It seems to go crazy if
you specify too many threads initially.  Again, that number should not be a
problem for the machine in theory, as it's a multicore Xeon.  Platform is
Linux 2.6 RHEL.  Any idea what's happening here?

-Ray
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20090410/4a09b5b7/attachment-0001.html>


More information about the varnish-dev mailing list