development efforts on the Solaris side.
Theo Schlossnagle
theo at omniti.com
Wed Jan 9 15:27:13 CET 2008
On Jan 9, 2008, at 4:40 AM, Poul-Henning Kamp wrote:
> Ok, various comments:
>
> The autoconf stuff and the #inclusion of "varnish_config.h" all over
> the place you will have to negotiate with DES, I only do C code.
First off, let me say thanks for a detailed, clear and critical
response.
>
> lib/libvarnish/time.c:
>
> I don't like #ifdefs like that in the middle of functions.
> The appropriate way is to add a compat function in
> libvarnishcompat.
okay. makes sense.
>
> flock/fcntl-locks you're already talking with DES about.
>
> curses:
> What exactly is the semantics of KEY_RESIZE ?
Curses sends KEY_RESIZE on a resize event, however Solaris' curses is
a bit aged and doesn't support that.
>
> mgt_vcc.c:
> The suffix shouldn't matter to dlopen(), please explain ?
> White-space and {} spam.
> Adding the c-compiler command to the error message is bound
> to make it overflow the linewidth.
It isn't dlopen, it is the compiler linker chain that fails on
Solaris. While it is arguable that it is a SunStudio bug, I think the
limitation is extremely unnecessary and annoying -- but alas. The
work around does not break gcc.
>
> VCL_WaitForActive()
> Need to look at that in more detail. I deliberately tried
> to limit varnish to only use mutexes for locking, so introducing
> semaphores just for this seems somewhat silly.
I added it while troubleshooting a different timeout that was
triggering. I happened to be running this on ZFS and when it asked
"ho big can I make my backing file?" it answered 21TB. as the vfsstat
stuff has different meaning when you don't have a fixed number of fs
blocks. I was timing out trying to mmap 21TB -- so I added that
semaphore. When I further understood the problem, I left the code as
it seemed more correct.
>
> storage_umem.c
> I'm not sure I see much point in this. The main advantage of
> umem is SMP localized storage management, and the Varnish
> objects are exactly not local to any one CPU.
> Benchmarks could possibly convince me otherwise.
The thing it adds is slab allocation with reduced CPU contention and
that seems to work pretty well in my tests. I completely agree that
benchmarks would be a needed step to legitimize the approach. I added
it when I was having the 21TB mmap issue above. IT was such simple
code to add and seemed useful even if it remained an experimental
stevedore implementation.
>
> #include <err.h>
> This may be a portability issue where the easiest way to fix
> is to avoid it.
Agreed. When I started working on this, I got an error that was
satisfied by adding err.h. I just checked and removing it no longer
produces an error, so either I disabled some code path in the
preprocessor or the code that argued was removed. It's bee a few
revisions since I started this patch.
> sendfile():
> Does the solaris sendfile guarantee that storage is no longer
> touched when it returns ? Otherwise it's as little use as
> the FreeBSD and Linux versions.
>
> /* pick a stevedore and bump the head along */
> /* XXX: only safe as long as pointer writes are atomic */
> +/* jesus: dear god, are you crazy? */
> stv = stevedores = stevedores->next;
>
> God to Jesus: No I'm not.
heh... cute. Just my annotations about the assumption. Was worried
that the assumption that both assignments were atomic. One is, then
the next is. Perhaps that doesn't matter. It does seem like a hard
to trigger race condition to me because while the assignment is
atomic, the access to stevedores->next is not guaranteed to be view
consistent in that assignment:
T1: get stevedores->next in R(T1,a)
T2: get stevedores->next in R(T2,a)
T1: set stevedores to R(T1,a)
T2: set stevedores to T(T2,a)
Now T1 and T2 both "advanced the pointer" but they did the same work
and the stv they have is the same. Is that not a problem? Perhaps I
don't understand the impact of that scenario correctly.
>
> -/* Note: intentionally not IOV_MAX */
> +#ifdef IOV_MAX
> +#define MAX_IOVS IOV_MAX
> +#else
> #define MAX_IOVS (HTTP_HDR_MAX * 2)
> +#endif
>
> Which bit of "intentionally" didn't you understand ?
> Have you examined the value of IOV_MAX, looked at where
> this is used and measured the impact ?
I understood intentionally. I also understood that it doesn't work on
Solaris. On Solaris, IOV_MAX is the maximum number of elements that
can be passed to writev(2), using a larger number will fail.
Perhaps this is the wrong solution to the problem. You cannot use
HTTP_HDR_MAX*2 as the nvec to writev(2) on Solaris. It is strictly
limited to IOV_MAX. So, the app breaks. After reading the code, it
looked like that was the right place to fix it. Perhaps there should
be an autoconf fragment that detects the "real" OS IOV_MAX and then
uses that in the event that it is lower HTTP_HDR_MAX*2. Or am I
thinking about this all wrong?
>
> cache_acceptor.c
> Under no circumstances should #ifdef HAVE_PORT_CREATE be
> necessary here. If a new method is necessary for the
> acceptors, so be it.
Completely agreed. I noted that it was a hack in a previous email.
I'd like that add the ping stuff as a function into each acceptor so
they can have their own approach to waking the acceptor thread up.
Solaris' portfs is much more like kqueue than epoll and support more
than just filedescriptors. It allows user-space eventing, so it is
really easy to have one thread just say "dude, wake up" to another
waiting in port_get(n). People tend to have strong preferences to
adding functions to structures in C, so I was pretty confident that I
should propose the design before implementing it, as it would likely
be redone.
Best regards,
Theo
--
Theo Schlossnagle
Esoteric Curio -- http://lethargy.org/
OmniTI Computer Consulting, Inc. -- http://omniti.com/
More information about the varnish-dev
mailing list