solve the pcre stack overflow #1576 - was: stack vs workspace for pcre and others

Poul-Henning Kamp phk at phk.freebsd.dk
Mon May 18 12:48:49 CEST 2015


--------
In message <5559BF04.4010500 at schokola.de>, Nils Goroll writes:
>To conclude this discussion:

For the record:

I primarily consider this to be a PCRE problem.

Any library which gives rise to potential infinite on-stack recursion,
depending on the data input, is not a particularly good library in my view.

As far as I can tell, PCRE is aware of this problem, but are facing
CS-theoretical constraints on possible solutions (as in: "which part
of recursive didn't you understand ?")

When we talked about this last time, the assumption was that JIT was not
a solution for this.

Since then people who are more clued in these details than me told
us that JIT is indeed the solution now, and we have enabled it,
provided PCRE version is know to be good.

I am quite confident that the PCRE people will want JIT to work, so I
am not too worried about it failing mysteriously later.

So the remining issue is people using a particular subclass of
regexps, on systems with old PCRE's or architectures where PCRE
does not support JIT.

This does not construe "a big problem" in my view.  (So far all I
have been shown that this subclass of regexps are *convenient*,
nobody has shown that they are *indispensable*.)

The other side of this debate is the practice of trying to second
guess the compiler and runtime environments behaviour with respect
to stack use.

The C-standards have this to say about it:  Don't even think about it.

Compilers can do all sorts of weird shit to stacks, and some hardware
architectures have really strange stacks.  Stacks can run up or
down, they can be discontinuous, the can behave in very unpredicatable
ways.

For instance an arch with register banks can show plenty of stack
available, but if you call a function which needs 16 bytes total
on the stack you fail, because that exact call spills a register
bank of several thousand bytes onto the stack.

(The fact that you just called the very same function a few lines
earlier and it succeeded is because the compiler chose to inline
the function there.)

We need *really* good arguments before we overcome that stigma, and
in my judgement "a subclass of regexps on non-mainstream architectures
or outdated installations" is not enough to overcome that barrier.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.



More information about the varnish-dev mailing list