many workers threads failed with EAGAIN

Tue Sep 1 00:10:08 CEST 2009

I'm caching a pretty large working set of small objects, so I'm pretty
sure I need a large number of threads.  I started out with:
# -p thread_pool_min not specified
 -p thread_pool_max=1000 \
 -p thread_pools=2 \

...but quickly saw many verflowed work requests.  My estimates are
rough, but this was measured after a few minutes with varnish in front
of two backends that normally serve between 700-1100 requests/sec of
about 100k to 300k unique objects per day(my num objects could be
higher, I'm uncertain.)

So before posting I had completely dwelled on a max_threads knob in
linux which was already set quite high.

cat /proc/sys/kernel/threads-max
143360

I thought more about how threads get their own process entries in
Linux and so increased max user processes              (-u) 71680.

No more "Create worker thread failed 11 Resource temporarily" messages.

Now varnish consumes the 8Gb of RAM until it starts swapping the last
2GB and IOwait consumes the box, cache hit ratio degrades eventually
and varnishd wedges...all traffic from the box zero's out.

On Mon, Aug 31, 2009 at 2:41 PM, Poul-Henning Kamp<phk at phk.freebsd.dk> wrote:
> In message <dcccdf790908310536m155a49cfw770c04657aab1ab6 at mail.gmail.com>, David
>  Birdsong writes:
>>varnishlog has a lot of these:
>> 0 Debug        - "Create worker thread failed 11 Resource temporarily
>>unavailable"
>>
>>sure enough, overflowed and dropped work requests are steadily on the rise
>>Hitrate ratio:       10      100     1000
>>Hitrate avg:     0.8584   0.8506   0.8581
>
>>        1007          .            .   N worker threads
>>        1007         0.00         0.34 N worker threads created
>>       12425         4.00         4.20 N worker threads not created
>>        2002         1.00         0.68 N queued work requests
>>      651706       157.04       220.47 N overflowed work requests
>>      460396       262.07       155.75 N dropped work requests
>
> It is not clear to me why you ended up needing so many threads,
> but the usualy explanation is comms problems either in client
> or backend direction.
>
> If you have not enabled backend probing, you should do so, since that
> prevents the threads from getting stuck on a troubled backend.
>
> Poul-Henning
>
> --
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk at FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.
>