[Varnish] #276: varnishd leaks fds and enters busy-loop

Varnish varnish-bugs at projects.linpro.no
Wed Jul 16 15:09:43 CEST 2008


#276: varnishd leaks fds and enters busy-loop
----------------------+-----------------------------------------------------
 Reporter:  wichert   |       Owner:  phk                      
     Type:  defect    |      Status:  new                      
 Priority:  high      |   Milestone:  Varnish 2.0 code complete
Component:  varnishd  |     Version:  trunk                    
 Severity:  major     |    Keywords:                           
----------------------+-----------------------------------------------------
 We have a deployment with two varnish services in a load balanced setup.
 Both servers were started at the same time and both stopped working at the
 same time today. The observed symptoms were varnish eating 100% CPU time
 and not responding to requests at all. strace of the varnish process
 showed 27 threads running. Most of them were not doing anything, but one
 thread was showing a busy-loop:

 {{{
 poll([{fd=6, events=POLLIN, revents=POLLIN}], 1, 1000) = 1
 clock_gettime(CLOCK_REALTIME, {1216213167, 164448162}) = 0
 accept(6, 0x6fc69334, [128])            = -1 EMFILE (Too many open files)
 poll([{fd=6, events=POLLIN, revents=POLLIN}], 1, 1000) = 1
 clock_gettime(CLOCK_REALTIME, {1216213167, 164861833}) = 0
 accept(6, 0x6fc69334, [128])            = -1 EMFILE (Too many open files)
 poll([{fd=6, events=POLLIN, revents=POLLIN}], 1, 1000) = 1
 clock_gettime(CLOCK_REALTIME, {1216213167, 165164688}) = 0
 accept(6, 0x6fc69334, [128])            = -1 EMFILE (Too many open files)
 poll([{fd=6, events=POLLIN, revents=POLLIN}], 1, 1000) = 1
 clock_gettime(CLOCK_REALTIME, {1216213167, 165463395}) = 0
 accept(6, 0x6fc69334, [128])            = -1 EMFILE (Too many open files)
 poll([{fd=6, events=POLLIN, revents=POLLIN}], 1, 1000) = 1
 clock_gettime(CLOCK_REALTIME, {1216213167, 165756394}) = 0
 accept(6, 0x6fc69334, [128])            = -1 EMFILE (Too many open files)
 poll([{fd=6, events=POLLIN, revents=POLLIN}], 1, 1000) = 1
 clock_gettime(CLOCK_REALTIME, {1216213167, 166071370}) = 0
 }}}

 We saw the exact same behaviour on one of the servers last week (the other
 server was not running at the time). Both servers are running trunk as of
 r2877.

-- 
Ticket URL: <http://varnish.projects.linpro.no/ticket/276>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator


More information about the varnish-bugs mailing list