[master] d766510cd Second part of the CHERI saga
phk at FreeBSD.org
Sat Nov 26 10:49:05 UTC 2022
Author: Poul-Henning Kamp <phk at FreeBSD.org>
Date: Sat Nov 26 10:48:34 2022 +0000
Second part of the CHERI saga
diff --git a/doc/sphinx/phk/cheri1.rst b/doc/sphinx/phk/cheri1.rst
index ce27d6495..03545e568 100644
@@ -173,10 +173,10 @@ First test-run
Just to see how bad it is, we run the main test-scripts::
% cd bin/varnishtest
- % ./varnistest -i -k -q tests/*.vtc
+ % ./varnishtest -i -k -q tests/*.vtc
38 tests failed, 33 tests skipped, 754 tests passed
That's not half bad…
diff --git a/doc/sphinx/phk/cheri2.rst b/doc/sphinx/phk/cheri2.rst
new file mode 100644
@@ -0,0 +1,121 @@
+How Varnish met CHERI 2/N
+CHERI capabilities are twice the size of pointers, and Varnish not
+only uses a lot of pointers per request, it is also stingy with
+RAM, because it is not uncommon to use 100K worker threads.
+A number of test-cases fail because they are too stingy with memory
+allocations, I will deal with them as I get to them, and merely
+note them here as part of the accounting::
+ Increase workspace
+ TEST tests/c00108.vtc
+ TEST tests/r01038.vtc
+ TEST tests/r01120.vtc
+ TEST tests/r02219.vtc
+ TEST tests/o00005.vtc
+Things you cannot do under CHERI: Pointers in Pipes
+Varnish has a central "waiter" service, whose job it is to monitor
+file descriptors to idle network connections, and do the right thing
+if data arrives on them, or if they are, or should be closed after
+For reasons of performance, we have multiple implementations:
+``kqueue(2)`` (BSD), ``epoll(2)`` (Linux), ``ports(2)`` (Solaris)
+and ``poll(2)`` which should work everywhere POSIX has been read.
+We only have the ``poll(2)`` based waiter for portability, one
+less issue to deal with during bring-up on new platforms, its
+performance degrades to uselessness with contemporary loads
+of open network connections.
+The way they all work is that have a single thread sitting
+in the relevant system-call, monitoring tens of thousands
+of file descriptors.
+Some of those system calls allows other threads to add fds to the
+list, but ``poll(2)`` does not, so when we start the poll-waiter
+we create a ``pipe(2)``, and have the waiter-thread listen to that
+When another thread wants to add a file descriptor to the inventory,
+it uses ``write(2)`` to send a pointer into that pipe. The kernel
+provide all the locking and buffering for us, wakes up the waiter-thread
+which reads the pointer, adds the new fd to its inventory and dives
+back into ``poll(2)``.
+This is 100% safe, because nobody else can get to a pipe created
+with ``pipe(2)``, but there is no way CHERI could spot that to
+make an execption, so reading pointers out of a filedescriptor,
+cause fully justified core-dumps.
+If the poll-waiter was actaully relevant, the proper fix would be
+to let the sending thread stick things on a locked list and just
+write a nonce-byte into the pipe to the waiter-thread, but that
+goes at the bottom of the TODO list, and for now I just remove the
+-Wpoll argument from five tests, which then pass::
+ TEST tests/b00009.vtc
+ TEST tests/b00048.vtc
+ TEST tests/b00054.vtc
+ TEST tests/b00059.vtc
+ TEST tests/c00080.vtc
+But why five tests ?
+It looks like one to test the poll-waiter and four cases of copy&paste.
+Never write your own Red-Black Trees
+In general there are few pieces of code I dare not wade into,
+but there are a LOT of code I dont want to touch, if there
+is any way to avoid it.
+Red-Black trees are one of them.
+In Varnish we stol^H^H^H^H imported both ``<queue.h>`` and ``<tree.h>``
+from FreeBSD, but as a safety measure we stuck a ``V`` prefix on
+everything in them.
+Every so often I will run a small shell-script which does the
+v-thing and compare the result to ``vtree.h`` and ``vqueue.h``,
+to keep up with FreeBSD.
+Today that paid off handsomely: Some poor person on the CHERI
+team had to wade into ``tree.h`` and stick ``__no_subobject_bounds``
+directives to pointers to make that monster work under CHERI.
+I just ran my script and 20 more tests pass::
+ TEST tests/b00068.vtc
+ TEST tests/c00005.vtc
+ TEST tests/e00003.vtc
+ TEST tests/e00008.vtc
+ TEST tests/e00019.vtc
+ TEST tests/l00002.vtc
+ TEST tests/l00003.vtc
+ TEST tests/l00005.vtc
+ TEST tests/m00053.vtc
+ TEST tests/r01312.vtc
+ TEST tests/r01441.vtc
+ TEST tests/r02451.vtc
+ TEST tests/s00012.vtc
+ TEST tests/u00004.vtc
+ TEST tests/u00010.vtc
+ TEST tests/v00009.vtc
+ TEST tests/v00011.vtc
+ TEST tests/v00017.vtc
+ TEST tests/v00041.vtc
+ TEST tests/v00043.vtc
+Only nine failing tests left now.
diff --git a/doc/sphinx/phk/index.rst b/doc/sphinx/phk/index.rst
index 755e75d5c..f5741e337 100644
@@ -13,6 +13,7 @@ You may or may not want to know what Poul-Henning thinks.
More information about the varnish-commit