Source Code paranoia in Varnish

Poul-Henning Kamp phk at phk.freebsd.dk
Tue Sep 25 11:17:06 CEST 2007


"Source Code Paranoia" is about how much, or little, the programmer
trusts that things are as expected.

My usual trick question to junior programmers on this topic is:
	"What is the correct thing to do if close(2) fails ?"
and I've enjoyed seeing faces change from "You think you're so
smart" over "uhm, that's actually a very good question" to "I have
NO idea..." many many times.

Varnishd is not a very big program, just shy of 30k code lines, but
that should not be confused with varnish being a simple program.
Between shared memory, threads and processes and advanced memory
management, there are a lot of things that need to be "just right"
for things to work.

And given the speeds at which users run varnish, any minor problem
will escalate into a major problem in fractions of a second and
the evidence destroyed in the process.

That's why, about one line in 20 in varnishd sources are assert(3)
like checks.

Assert(3) will coredump the process if things are exactly the way
the programmer expects them to be, "this pointer is before that
pointer", "this systemcall returns success" and so on.

So far, most of the coredumps that our patient users have reported,
have been one of those asserts doing exactly what it was put there
for.

Unfortunately, some users are (still) seing what looks like
pointer-tango and absent any obvious leads, I spent half the night
running FlexeLint on the Varnishd sources, this time with even more
paranoid settings.

Until now I have used "it shouldn't make the source unreadable" as
yardstick for source-code paranoia, but I'll crank that up a notch
now, accepting minor eye-sores in order to catch even more bugs.

I am not willing to take the cost of the entire signed/unsigned
"cast all over the place" thing yet, but I accept the cost of
the "pointer difference" paranoia thing.

And I found two bugs, none of which seems to be able to explain
the pointer-tangos, but they were bugs nontheless.


So what is the correct way to deal with the return value of close(2)
system calls ?

You assert(3) that it succeeds and hope to never see that coredump.

Poul-Henning

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.



More information about the varnish-dev mailing list