Workspace overflow handling

Mon Feb 9 15:32:32 CET 2015

VDD Hamburg talking point:

Varnish asserting on workspace overflow is a problem that we really should
address. It is most hurtful when it happens in Varnish core, as there are
many code paths relying on workspace being available. If none was available
the assertion triggers taking the cache with it. (Examples: Vary
processing, delivery processor pushes, delivery IO vectors etc). Creating
proper error handling and state unwinding for all these will be a major
undertaking, and also error prone as testing all the failure points will be
very hard.

Workspace exhaustion also hurt in VCL space. Most VRT functions are written
to handle it, but will do so by truncating the result and log the fact
(LostHeader). This masks errors, and can potentially be an attack vector
for circumventing VCL implemented security barriers. It also poses a DOS
attack vector, if you can know there are some serious manipulations
happening on some header and send large payloads on them, causing an assert
later when Varnish attempts delivery. In my opinion any failed attempt at
setting a header from VCL should result in an error response immediately as
we could not process the request properly.

One way of dealing with this issue would be to add some guarantees for
workspace allocations: Unless the workspace overflow flag is already set,
all code is guaranteed to be able to allocate at least the set size of the
workspace. This is achieved by allocating twice the amount of needed
workspace on allocation. Since this space is normally untouched it will
just be virtual memory and not backed by real memory. (We might have to
bypass malloc and go for mmap anonymous to be able to do that). All
WS_Alloc/WS_Release calls will then update the overflow flag whenever half
of the available workspace has been used. Upon recycling of the workspace
(request or busyobj), the flag is tested and if an overflow occured an
madvise(MADV_DONTNEED/MADV_FREE) is issued on the second half of the
mapping to return the pages to the OS. This way the extra pages are
returned to the OS, causing the range to be pure virtual again.

Error handling in Varnish core will now be able to just have a handful of
check points (mostly after the major VCL functions where we are prepared to
error out anyways). If the overflow flag is set, we write out a static 5xx
response (unless it's too late), and start processing the next request (or
close if that's too late).

In VCL we will teach the VCC compiler to check after each statement if the
overflow flag is set, and return immediately when it is (so VCL execution
is terminated prematurely). The next check point in Varnish core will then
pick up that the overflow has happened and error out from there.

Comments much appreciated.

Regards,
Martin Blix Grydeland

-- 
<http://varnish-software.com>*Martin Blix Grydeland*
Senior Developer | Varnish Software AS
Mobile: +47 992 74 756
We Make Websites Fly!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20150209/f223dc38/attachment.html>