[Varnish] #599: WRK_Queue should prefer thread pools with idle threads / improve thread pool loadbalancing
Varnish
varnish-bugs at projects.linpro.no
Tue Dec 8 12:45:54 CET 2009
#599: WRK_Queue should prefer thread pools with idle threads / improve thread
pool loadbalancing
-------------------------+--------------------------------------------------
Reporter: slink | Owner: phk
Type: enhancement | Status: new
Priority: high | Milestone:
Component: varnishd | Version: trunk
Severity: normal | Keywords:
-------------------------+--------------------------------------------------
The algorithm implemented in WRK_Queue basically was so far:
* Choose a worker pool round robin
* Dispatch the request on that pool
* find an idle thread OR
* put the request on a queue OR
* fail of the queue is full (reached ovfl_max)
This algorithm is probably good enough for many cases, but I noticed that
it can have a negative impact in particular during startup. Threads for
the pools are created sequentially (in wrk_herder_thread), so shortly
after startup, some pools may get hit by requests when they don't have any
threads yet. I noticed this because overflowing pools would trigger the
issue documented in #598.
Here's a snapshot of this situation in Solaris mdb:
{{{
> 0x0000000000464ee0/D
varnishd`nwq:
varnishd`nwq: 4
> wq/p
varnishd`wq:
varnishd`wq: 0x483f50
## w'queues
> 0x483f50,4/p
0x483f50: 0x507160 0x506ed0 0x506f30 0x506f90
struct wq {
82 unsigned magic;
83 #define WQ_MAGIC 0x606658fa
84 struct lock mtx;
85 struct workerhead idle;
86 VTAILQ_HEAD(, workreq) overflow;
87 unsigned nthr;
88 unsigned nqueue;
89 unsigned lqueue;
90 uintmax_t ndrop;
91 uintmax_t noverflow;
92 };
> 0x507160,60::dump -e
507160: 606658fa 00000000 005359a0 00000000
507170: c3c3fe00 fffffd7f f03d0e30 fffffd7f
507180: 00000000 00000000 00507180 00000000
507190: 00000177 00000000 00000000 00000000 177 thr
5071a0: 00000000 00000000 00000051 00000000 51 overflow
5071b0: 00507150 00000000 00000000 00000000
> 0x506ed0,60::dump -e
506ed0: 606658fa 00000000 005359f0 00000000
506ee0: b9d6ee00 fffffd7f c1a38e30 fffffd7f
506ef0: 00000000 00000000 00506ef0 00000000
506f00: 00000050 00000000 00000000 00000000 50 thr
506f10: 00000000 00000000 000001cf 00000000 1cf noverflow
506f20: 00000051 00000000 00000000 00000000
> 0x506f30,60::dump -e
506f30: 606658fa 00000000 00535a40 00000000
506f40: 00000000 00000000 00506f40 00000000
506f50: 007b65e8 00000000 0292e778 00000000
506f60: 00000000 00000201 00000000 00000000 0 thr 201 nqueue
506f70: 00000001 00000000 00000201 00000000 1 drop 201
noverflow
506f80: 00000061 00000000 00000000 00000000
> 0x506f90,60::dump -e
506f90: 606658fa 00000000 00535a90 00000000
506fa0: 00000000 00000000 00506fa0 00000000
506fb0: 007baf08 00000000 0285e218 00000000
506fc0: 00000000 00000201 00000000 00000000 0 thr 201 nqueue
506fd0: 00000000 00000000 00000201 00000000 201
noverflow
506fe0: 00506f80 00000000 00000000 00000000
}}}
Notice that {{{wq[2]}}} and {{{wq[3]}}} have their nqueues saturated and
no idle threads while {{{wq[0]}}} and {{{wq[1]}}} probably have idle
threads by now.
I am suggesting the following changes to WRK_Queue:
* Improve the round-robin selection on MP systems by using a volatile
static (still avoiding additional locking overhead for the round robin
state)
* First check all pools for idle threads (starting with the pool selected
by round-robin to remain in O(1) for the normal case)
* Only queue a request if there exists no pool with idle threads, and
queue where the queue is shortest
* Fail only if all queues are full
I'll attach a diff with my suggested solution.
--
Ticket URL: <http://varnish.projects.linpro.no/ticket/599>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list