[Varnish] #599: WRK_Queue should prefer thread pools with idle threads / improve thread pool loadbalancing

Mon Mar 7 13:16:43 CET 2011

#599: WRK_Queue should prefer thread pools with idle threads / improve thread
pool loadbalancing
-------------------------+--------------------------------------------------
 Reporter:  slink        |       Owner:  phk  
     Type:  enhancement  |      Status:  new  
 Priority:  high         |   Milestone:  Later
Component:  varnishd     |     Version:  trunk
 Severity:  normal       |    Keywords:       
-------------------------+--------------------------------------------------
Changes (by kristian):

  * milestone:  => Later

Old description:

> The algorithm implemented in WRK_Queue basically was so far:
>
>  * Choose a worker pool round robin
>  * Dispatch the request on that pool
>   * find an idle thread OR
>   * put the request on a queue OR
>   * fail of the queue is full (reached ovfl_max)
>
> This algorithm is probably good enough for many cases, but I noticed that
> it can have a negative impact in particular during startup. Threads for
> the pools are created sequentially (in wrk_herder_thread), so shortly
> after startup, some pools may get hit by requests when they don't have
> any threads yet. I noticed this because overflowing pools would trigger
> the issue documented in #598.
>
> Here's a snapshot of this situation in Solaris mdb:
>
> {{{
> > 0x0000000000464ee0/D
> varnishd`nwq:
> varnishd`nwq:   4
>
> > wq/p
> varnishd`wq:
> varnishd`wq:    0x483f50
>
> ## w'queues
> > 0x483f50,4/p
> 0x483f50:       0x507160        0x506ed0        0x506f30        0x506f90
>
>         struct wq {
> 82              unsigned                magic;
> 83      #define WQ_MAGIC                0x606658fa
> 84              struct lock             mtx;
> 85              struct workerhead       idle;
> 86              VTAILQ_HEAD(, workreq)  overflow;
> 87              unsigned                nthr;
> 88              unsigned                nqueue;
> 89              unsigned                lqueue;
> 90              uintmax_t               ndrop;
> 91              uintmax_t               noverflow;
> 92      };
>
> > 0x507160,60::dump -e
> 507160:  606658fa 00000000 005359a0 00000000
> 507170:  c3c3fe00 fffffd7f f03d0e30 fffffd7f
> 507180:  00000000 00000000 00507180 00000000
> 507190:  00000177 00000000 00000000 00000000    177 thr
> 5071a0:  00000000 00000000 00000051 00000000     51 overflow
> 5071b0:  00507150 00000000 00000000 00000000
> > 0x506ed0,60::dump -e
> 506ed0:  606658fa 00000000 005359f0 00000000
> 506ee0:  b9d6ee00 fffffd7f c1a38e30 fffffd7f
> 506ef0:  00000000 00000000 00506ef0 00000000
> 506f00:  00000050 00000000 00000000 00000000     50 thr
> 506f10:  00000000 00000000 000001cf 00000000    1cf noverflow
> 506f20:  00000051 00000000 00000000 00000000
> > 0x506f30,60::dump -e
> 506f30:  606658fa 00000000 00535a40 00000000
> 506f40:  00000000 00000000 00506f40 00000000
> 506f50:  007b65e8 00000000 0292e778 00000000
> 506f60:  00000000 00000201 00000000 00000000      0 thr         201
> nqueue
> 506f70:  00000001 00000000 00000201 00000000      1 drop        201
> noverflow
> 506f80:  00000061 00000000 00000000 00000000
> > 0x506f90,60::dump -e
> 506f90:  606658fa 00000000 00535a90 00000000
> 506fa0:  00000000 00000000 00506fa0 00000000
> 506fb0:  007baf08 00000000 0285e218 00000000
> 506fc0:  00000000 00000201 00000000 00000000      0 thr         201
> nqueue
> 506fd0:  00000000 00000000 00000201 00000000                    201
> noverflow
> 506fe0:  00506f80 00000000 00000000 00000000
> }}}
>
> Notice that {{{wq[2]}}} and {{{wq[3]}}} have their nqueues saturated and
> no idle threads while {{{wq[0]}}} and {{{wq[1]}}} probably have idle
> threads by now.
>
> I am suggesting the following changes to WRK_Queue:
>
>  * Improve the round-robin selection on MP systems by using a volatile
> static (still avoiding additional locking overhead for the round robin
> state)
>  * First check all pools for idle threads (starting with the pool
> selected by round-robin to remain in O(1) for the normal case)
>  * Only queue a request if there exists no pool with idle threads, and
> queue where the queue is shortest
>  * Fail only if all queues are full
>
> I'll attach a diff with my suggested solution.

New description:

 The algorithm implemented in WRK_Queue basically was so far:

  * Choose a worker pool round robin
  * Dispatch the request on that pool
   * find an idle thread OR
   * put the request on a queue OR
   * fail of the queue is full (reached ovfl_max)

 This algorithm is probably good enough for many cases, but I noticed that
 it can have a negative impact in particular during startup. Threads for
 the pools are created sequentially (in wrk_herder_thread), so shortly
 after startup, some pools may get hit by requests when they don't have any
 threads yet. I noticed this because overflowing pools would trigger the
 issue documented in #598.

 Here's a snapshot of this situation in Solaris mdb:

 {{{
 > 0x0000000000464ee0/D
 varnishd`nwq:
 varnishd`nwq:   4

 > wq/p
 varnishd`wq:
 varnishd`wq:    0x483f50

 ## w'queues
 > 0x483f50,4/p
 0x483f50:       0x507160        0x506ed0        0x506f30        0x506f90

         struct wq {
 82              unsigned                magic;
 83      #define WQ_MAGIC                0x606658fa
 84              struct lock             mtx;
 85              struct workerhead       idle;
 86              VTAILQ_HEAD(, workreq)  overflow;
 87              unsigned                nthr;
 88              unsigned                nqueue;
 89              unsigned                lqueue;
 90              uintmax_t               ndrop;
 91              uintmax_t               noverflow;
 92      };

 > 0x507160,60::dump -e
 507160:  606658fa 00000000 005359a0 00000000
 507170:  c3c3fe00 fffffd7f f03d0e30 fffffd7f
 507180:  00000000 00000000 00507180 00000000
 507190:  00000177 00000000 00000000 00000000    177 thr
 5071a0:  00000000 00000000 00000051 00000000     51 overflow
 5071b0:  00507150 00000000 00000000 00000000
 > 0x506ed0,60::dump -e
 506ed0:  606658fa 00000000 005359f0 00000000
 506ee0:  b9d6ee00 fffffd7f c1a38e30 fffffd7f
 506ef0:  00000000 00000000 00506ef0 00000000
 506f00:  00000050 00000000 00000000 00000000     50 thr
 506f10:  00000000 00000000 000001cf 00000000    1cf noverflow
 506f20:  00000051 00000000 00000000 00000000
 > 0x506f30,60::dump -e
 506f30:  606658fa 00000000 00535a40 00000000
 506f40:  00000000 00000000 00506f40 00000000
 506f50:  007b65e8 00000000 0292e778 00000000
 506f60:  00000000 00000201 00000000 00000000      0 thr         201 nqueue
 506f70:  00000001 00000000 00000201 00000000      1 drop        201
 noverflow
 506f80:  00000061 00000000 00000000 00000000
 > 0x506f90,60::dump -e
 506f90:  606658fa 00000000 00535a90 00000000
 506fa0:  00000000 00000000 00506fa0 00000000
 506fb0:  007baf08 00000000 0285e218 00000000
 506fc0:  00000000 00000201 00000000 00000000      0 thr         201 nqueue
 506fd0:  00000000 00000000 00000201 00000000                    201
 noverflow
 506fe0:  00506f80 00000000 00000000 00000000
 }}}

 Notice that {{{wq[2]}}} and {{{wq[3]}}} have their nqueues saturated and
 no idle threads while {{{wq[0]}}} and {{{wq[1]}}} probably have idle
 threads by now.

 I am suggesting the following changes to WRK_Queue:

  * Improve the round-robin selection on MP systems by using a volatile
 static (still avoiding additional locking overhead for the round robin
 state)
  * First check all pools for idle threads (starting with the pool selected
 by round-robin to remain in O(1) for the normal case)
  * Only queue a request if there exists no pool with idle threads, and
 queue where the queue is shortest
  * Fail only if all queues are full

 I'll attach a diff with my suggested solution.

--

-- 
Ticket URL: <http://varnish-cache.org/trac/ticket/599#comment:4>
Varnish <http://varnish-cache.org/>
The Varnish HTTP Accelerator