development efforts on the Solaris side.

Wed Jan 9 15:27:13 CET 2008

On Jan 9, 2008, at 4:40 AM, Poul-Henning Kamp wrote:
> Ok, various comments:
>
> The autoconf stuff and the #inclusion of "varnish_config.h" all over
> the place you will have to negotiate with DES, I only do C code.

First off, let me say thanks for a detailed, clear and critical  
response.

>
> lib/libvarnish/time.c:
>
> 	I don't like #ifdefs like that in the middle of functions.
> 	The appropriate way is to add a compat function in
> 	libvarnishcompat.

okay.  makes sense.

>
> flock/fcntl-locks you're already talking with DES about.
>
> curses:
> 	What exactly is the semantics of KEY_RESIZE ?

Curses sends KEY_RESIZE on a resize event, however Solaris' curses is  
a bit aged and doesn't support that.

>
> mgt_vcc.c:
> 	The suffix shouldn't matter to dlopen(), please explain ?
> 	White-space and {} spam.
> 	Adding the c-compiler command to the error message is bound
> 	to make it overflow the linewidth.

It isn't dlopen, it is the compiler linker chain that fails on  
Solaris.  While it is arguable that it is a SunStudio bug, I think the  
limitation is extremely unnecessary and annoying -- but alas.  The  
work around does not break gcc.

>
> VCL_WaitForActive()
> 	Need to look at that in more detail.  I deliberately tried
> 	to limit varnish to only use mutexes for locking, so introducing
> 	semaphores just for this seems somewhat silly.

I added it while troubleshooting a different timeout that was  
triggering.  I happened to be running this on ZFS and when it asked  
"ho big can I make my backing file?" it answered 21TB. as the vfsstat  
stuff has different meaning when you don't have a fixed number of fs  
blocks.  I was timing out trying to mmap 21TB -- so I added that  
semaphore.  When I further understood the problem, I left the code as  
it seemed more correct.

>
> storage_umem.c
> 	I'm not sure I see much point in this.  The main advantage of
> 	umem is SMP localized storage management, and the Varnish
> 	objects are exactly not local to any one CPU.
> 	Benchmarks could possibly convince me otherwise.

The thing it adds is slab allocation with reduced CPU contention and  
that seems to work pretty well in my tests.  I completely agree that  
benchmarks would be a needed step to legitimize the approach.  I added  
it when I was having the 21TB mmap issue above.  IT was such simple  
code to add and seemed useful even if it remained an experimental  
stevedore implementation.

>
> #include <err.h>
> 	This may be a portability issue where the easiest way to fix
> 	is to avoid it.

Agreed.  When I started working on this, I got an error that was  
satisfied by adding err.h.  I just checked and removing it no longer  
produces an error, so either I disabled some code path in the  
preprocessor or the code that argued was removed.  It's bee a few  
revisions since I started this patch.

> sendfile():
> 	Does the solaris sendfile guarantee that storage is no longer
> 	touched when it returns ?  Otherwise it's as little use as
> 	the FreeBSD and Linux versions.
>
> /* pick a stevedore and bump the head along */
> /* XXX: only safe as long as pointer writes are atomic */
> +/* jesus: dear god, are you crazy? */
> stv = stevedores = stevedores->next;
>
> 	God to Jesus:  No I'm not.

heh... cute.  Just my annotations about the assumption.  Was worried  
that the assumption that both assignments were atomic.  One is, then  
the next is.  Perhaps that doesn't matter.  It does seem like a hard  
to trigger race condition to me because while the assignment is  
atomic, the access to stevedores->next is not guaranteed to be view  
consistent in that assignment:

T1: get stevedores->next in R(T1,a)
T2: get stevedores->next in R(T2,a)
T1: set stevedores to R(T1,a)
T2: set stevedores to T(T2,a)

Now T1 and T2 both "advanced the pointer" but they did the same work  
and the stv they have is the same.  Is that not a problem?  Perhaps I  
don't understand the impact of that scenario correctly.

>
> -/* Note: intentionally not IOV_MAX */
> +#ifdef IOV_MAX
> +#define MAX_IOVS	IOV_MAX
> +#else
> #define MAX_IOVS	(HTTP_HDR_MAX * 2)
> +#endif
>
> 	Which bit of "intentionally" didn't you understand ?
> 	Have you examined the value of IOV_MAX, looked at where
> 	this is used and measured the impact ?

I understood intentionally.  I also understood that it doesn't work on  
Solaris.  On Solaris, IOV_MAX is the maximum number of elements  that  
can be passed to writev(2), using a larger number will fail.

Perhaps this is the wrong solution to the problem.  You cannot use  
HTTP_HDR_MAX*2 as the nvec to writev(2) on Solaris.  It is strictly  
limited to IOV_MAX.  So, the app breaks.  After reading the code, it  
looked like that was the right place to fix it.  Perhaps there should  
be an autoconf fragment that detects the "real" OS IOV_MAX and then  
uses that in the event that it is lower HTTP_HDR_MAX*2.  Or am I  
thinking about this all wrong?

>
> cache_acceptor.c
> 	Under no circumstances should #ifdef HAVE_PORT_CREATE be
> 	necessary here.  If a new method is necessary for the
> 	acceptors, so be it.

Completely agreed.  I noted that it was a hack in a previous email.   
I'd like that add the ping stuff as a function into each acceptor so  
they can have their own approach to waking the acceptor thread up.   
Solaris' portfs is much more like kqueue than epoll and support more  
than just filedescriptors.  It allows user-space eventing, so it is  
really easy to have one thread just say "dude, wake up" to another  
waiting in port_get(n).  People tend to have strong preferences to  
adding functions to structures in C, so I was pretty confident that I  
should propose the design before implementing it, as it would likely  
be redone.

Best regards,

Theo

--
Theo Schlossnagle
Esoteric Curio -- http://lethargy.org/
OmniTI Computer Consulting, Inc. -- http://omniti.com/