many workers threads failed with EAGAIN

Tue Sep 1 01:36:13 CEST 2009

On Mon, Aug 31, 2009 at 3:42 PM, Poul-Henning Kamp<phk at phk.freebsd.dk> wrote:
> In message <dcccdf790908311510u7a671334s798ca64f06aade4 at mail.gmail.com>, David
> Birdsong writes:
>
>>...but quickly saw many verflowed work requests.  My estimates are
>>rough, but this was measured after a few minutes with varnish in front
>>of two backends that normally serve between 700-1100 requests/sec of
>>about 100k to 300k unique objects per day(my num objects could be
>>higher, I'm uncertain.)
>
> I'm surprised if you need 1000 threads for that load, why does
> it take so long to serve them ?  (Avg 1s ?)
>
perhaps i do not need so many workers.

Since I have your attention, I'll provide some more concrete numbers.

Each server puses roughly 600K unique objects per day, an hour of peak
is about 90K unique objects and an hour in the trough is about 50K
objects where each objects is between 60-120Kb.

My goal was to determine the maximum number of backends a single
instance of varnish could handle given that traffic pattern.  My blind
guess/hope was 5-7 backends, so I estimated about  3-4.6 million
unique objects per day, but with my parameters on a dual core Intex
2.5Ghz with 8G ram and 10G malloc'd with a swap partition on a
non-system drive, I was only able to serve traffic for two backends
while varnish consumed the RAM.  Once it went into swap, IOwait made
varnish unstable.  I couldn't discern much out of varnishstat to tell
me what was going on under the hood.  This is when I started moving
thread_pool_max up and down.  Oh, and I do have health checks on my
backends:

backend img10 {
        .host = "vimg10.imageshack.us";
        .port = "80";
        .probe = {
          .url = "/hc.txt";
          .timeout = 0.5 s;
          .window = 8; # how many probes are examined
          .threshold = 3; # how many must pass for us to be healthy
          .interval = 3s; # time between health checks
        }
}

I'm about to try another run with the same backends on a dual quad
core Xeon with 16Gb of ram, this time I will probably have varnish
malloc just under the max memory to avoid so much swapping which is
seemingly worse than the cache miss.

If you have any suggestions or need me to provide more info, I'd
appreciate your expertise.

> --
> Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
> phk at FreeBSD.ORG         | TCP/IP since RFC 956
> FreeBSD committer       | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.
>