Workspace overflow handling
dridi.boukelmoune at zenika.com
Mon Feb 9 15:56:26 CET 2015
I have been thinking about this for a while, but couldn't go past "it is
sometimes overkill to crash the whole child process".
Regarding your comment on state unwinding, my main concern is critical
sections and shared/global state. But I'm not familiar enough with the
code base to measure the implications.
On an unrelated note, I have been trying to code in Rust lately, and it
provides a `panic!` macro. It will kill the failing thread/task and somewhat
leave the rest of the program alone which I thought could also apply to
I know this is half off-topic, but on top of not panicking on memory
exhaustion, I thought we could also "panic" a single thread when it's a
worker (regardless of the issue) when we can't serve a 5xx response.
The current assert system is a bit too unforgiving for many cases.
I hope I'm making some sense here.
On Mon, Feb 9, 2015 at 3:32 PM, Martin Blix Grydeland
<martin at varnish-software.com> wrote:
> VDD Hamburg talking point:
> Varnish asserting on workspace overflow is a problem that we really should address. It is most hurtful when it happens in Varnish core, as there are many code paths relying on workspace being available. If none was available the assertion triggers taking the cache with it. (Examples: Vary processing, delivery processor pushes, delivery IO vectors etc). Creating proper error handling and state unwinding for all these will be a major undertaking, and also error prone as testing all the failure points will be very hard.
> Workspace exhaustion also hurt in VCL space. Most VRT functions are written to handle it, but will do so by truncating the result and log the fact (LostHeader). This masks errors, and can potentially be an attack vector for circumventing VCL implemented security barriers. It also poses a DOS attack vector, if you can know there are some serious manipulations happening on some header and send large payloads on them, causing an assert later when Varnish attempts delivery. In my opinion any failed attempt at setting a header from VCL should result in an error response immediately as we could not process the request properly.
> One way of dealing with this issue would be to add some guarantees for workspace allocations: Unless the workspace overflow flag is already set, all code is guaranteed to be able to allocate at least the set size of the workspace. This is achieved by allocating twice the amount of needed workspace on allocation. Since this space is normally untouched it will just be virtual memory and not backed by real memory. (We might have to bypass malloc and go for mmap anonymous to be able to do that). All WS_Alloc/WS_Release calls will then update the overflow flag whenever half of the available workspace has been used. Upon recycling of the workspace (request or busyobj), the flag is tested and if an overflow occured an madvise(MADV_DONTNEED/MADV_FREE) is issued on the second half of the mapping to return the pages to the OS. This way the extra pages are returned to the OS, causing the range to be pure virtual again.
> Error handling in Varnish core will now be able to just have a handful of check points (mostly after the major VCL functions where we are prepared to error out anyways). If the overflow flag is set, we write out a static 5xx response (unless it's too late), and start processing the next request (or close if that's too late).
> In VCL we will teach the VCC compiler to check after each statement if the overflow flag is set, and return immediately when it is (so VCL execution is terminated prematurely). The next check point in Varnish core will then pick up that the overflow has happened and error out from there.
> Comments much appreciated.
> Martin Blix Grydeland
> Martin Blix Grydeland
> Senior Developer | Varnish Software AS
> Mobile: +47 992 74 756
> We Make Websites Fly!
> varnish-dev mailing list
> varnish-dev at varnish-cache.org
More information about the varnish-dev