accept_fd_holdoff second/millisecond confusion

Eden Li eden at mojiti.com
Fri Mar 20 20:37:05 CET 2009


Hi all,

We ran into a situation where our backend held connections open for so
long that we ran into the open file limit.  After clearing up the
backend and ensuring, varnish never came back and we had to restart it
in order for it to start relaying connections again.

Flipping on debug mode shows the error "Too many open files when
accept(2)ing. Sleeping." which should sleep for 50 milliseconds
(according to param.show).  Instead it seems to be sleeping for
50*1000 *seconds* (13 hours).  Looking at the code, it appears that
this is either a doc bug or a code bug.  I was able to fix the root
issue with this patch:

--- a/varnish-2.0.1/bin/varnishd/cache_acceptor.c       2008-10-17
11:59:49.000000000 -0700
+++ b/varnish-2.0.1/bin/varnishd/cache_acceptor.c       2009-03-20
12:16:15.000000000 -0700
@@ -228,7 +228,7 @@
                                case EMFILE:
                                        VSL(SLT_Debug, ls->sock,
                                            "Too many open files when
accept(2)ing. Sleeping.");
-
TIM_sleep(params->accept_fd_holdoff * 1000.0);
+
TIM_sleep(params->accept_fd_holdoff * 0.001);
                                        break;
                                default:
                                        VSL(SLT_Debug, ls->sock,

Is this the right fix?  Should I create a ticket in trac for this?
We're getting around it now by setting the max open file limit and
listen_depth appropriately so that varnish never gets to this point,
but it'd be nice if this was fixed in case we ever accidentally get
here again.



More information about the varnish-dev mailing list