Proposed restructuring of http_conn and where the data is stored

Tue Nov 22 00:04:24 CET 2011

What about decoupling the workspace instead (or maybe as well)? That way 
the workspace can be released (back into a pool) at the end of a request, 
and not eat up memory during idle-time for a session. And then the 
workspace can go with the http_conn to the next worker in your scenario.

This is from one of our servers:

       1 VCL_Recv
       3 Pipe
       3 Reading_Backend
       3 Waiting_List
      10 Connect_Backend
      28 Background
     223 Writing_Client
     225 Linger
     226 Reading_Backend_Hdr
   17274 Waiting_Client_Poll
  182004 Idle

With a 128K sess_workspace, that's 22 gigs in Idle sessions.

Now I know this is sess_workspace and the worker workspace is what you're 
after here, but if we decouple workspaces as a whole we can move them 
around for this and not waste extra memory on this problem.

My 2 cents, at least.

Cheers,

 	DocWilco

On Mon, 21 Nov 2011, Martin Blix Grydeland wrote:

> For the streaming development, some changes will be needed to the http_conn
> and where it stores it's data (buffers while reading headers and such, as
> well as read-ahead and pipeline for the http protocol). Today the data is
> stored on the session workspace (for the client communication) and on the
> worker workspace for the backend communication.
>
> For the streaming development this causes problems when we want to hand
> over the body fetching to another worker, as there is read-ahead data in
> the http_conn buffer that it needs access to, but this will then be
> pointing into the workspace of the previous worker. I'd rather decouple
> this, as it creates a strong relationship between the two threads and
> troubles will come if they are not synchronized with regard to this address
> space (e.g. if the client hangs up, the client thread needs to make sure
> the body fetcher thread have finished with the data before it can reuse
> it's workspace).
>
> To come around this, I'm proposing to make the http_conn's a pooled
> resource of their own, with their own internal buffer space. Something
> along these lines:
>
>   - Each thread pool have a list of unused http_conn's
>   - Each worker thread have a pool of unused http_conn's. When the worker
>   is idle (goes into pool or starts processing a new request), this pool is
>   increased/reduced from the thread pool's list (or creating new ones) to a
>   size of 2. Number of 2 as it will need 1 for the client request, and maybe
>   one for the backend fetch.
>   - Worker thread takes http_conns from it's pool when it needs a HTC (or
>   creating a new one if it goes empty). It returns them to it's pool when
>   they are not used anymore
>   - http_conn's can then be transferred from one thread to another and
>   take their data with them. (The receiving worker will then end up with one
>   more, but this is returned to the thread pool's list when it's finished
>   with the request. The thread giving one away gets one from the thread pool)
>   - Whenever the house keeping is done on the worker's list, it will check
>   the buffer sizes against the current parameter sizes, and free the
>   http_conn's and creating new ones if they have changed.
>
> I believe this creates a mostly lock free system, but still make these data
> structures decoupled from the session/thread and can be transferred between
> them when that is needed.
>
> Any comments?
>
> -- 
> Martin Blix Grydeland
> Varnish Software AS
>