[Varnish] #534: Threads stuck in trunk

Varnish varnish-bugs at projects.linpro.no
Thu Jul 23 20:11:40 CEST 2009


#534: Threads stuck in trunk
----------------------+-----------------------------------------------------
 Reporter:  anders    |       Owner:  phk          
     Type:  defect    |      Status:  new          
 Priority:  high      |   Milestone:               
Component:  varnishd  |     Version:  trunk        
 Severity:  critical  |    Keywords:  threads stuck
----------------------+-----------------------------------------------------
 I am running Varnish trunk/4144 in FreeBSD/amd64 7.2-RELEASE, with no
 additional patches.

 For some time I have had problems with Varnish suddenly going to the
 maximum of allowed threads, and staying there. This means Varnish stops
 responding to new connections.

 Symptoms:

 1) Lots of connections, most of which are in CLOSED or CLOSE_WAIT state:

 {{{
 root at cache11:~# netstat -aln | grep ^tcp | awk '{print $1, $4, $6}' | sort
 | uniq -c | sort -n
    1 tcp4 *.22 LISTEN
    1 tcp4 *.4949 LISTEN
    1 tcp4 *.5666 LISTEN
    1 tcp4 *.80 LISTEN
    1 tcp4 127.0.0.1.25 LISTEN
    1 tcp4 127.0.0.1.49787 FIN_WAIT_2
    1 tcp4 127.0.0.1.8080 LISTEN
    2 tcp4 80.91.39.151.22 ESTABLISHED
    2 tcp4 80.91.39.151.80 LAST_ACK
    4 tcp4 127.0.0.1.80 CLOSE_WAIT
 1177 tcp4 80.91.39.151.80 ESTABLISHED
 2126 tcp4 80.91.39.151.80 CLOSE_WAIT
 4476 tcp4 80.91.39.151.80 CLOSED
 }}}

 2) varnishlog output:

 {{{
 root at cache11:~# varnishlog
  7806 SessionClose - dropped
  7806 StatSess     - (null) (null) 1248368086 0 0 0 0 0 0 0
     0 Debug        - "Create worker thread failed 67 Too many processes"
  7807 SessionClose - dropped
  7807 StatSess     - (null) (null) 1248368090 0 0 0 0 0 0 0
  7807 SessionClose - dropped
  7807 StatSess     - (null) (null) 1248368092 0 0 0 0 0 0 0
  7807 SessionClose - dropped
 ^C
 }}}

 3) process status:

 root at cache11:~# ps axwwH -o pid,wchan,state,command | grep Varnish-Chld |
 grep -v grep | sort | uniq -c
    1 84804 kqread S    varnishd: Varnish-Chld cache11.finn.no (varnishd)
    1 84804 nanslp S    varnishd: Varnish-Chld cache11.finn.no (varnishd)
    1 84804 select I    varnishd: Varnish-Chld cache11.finn.no (varnishd)
    1 84804 select S    varnishd: Varnish-Chld cache11.finn.no (varnishd)
    1 84804 ucond  I    varnishd: Varnish-Chld cache11.finn.no (varnishd)
 3995 84804 umtxn  I    varnishd: Varnish-Chld cache11.finn.no (varnishd)

 If I enable mutex contest/logging (diag_bitmap set to 0x18), I get more
 varnishlog output:

 {{{
 root at cache11:~# varnishlog
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        - "MTX_LOCK(SES_New,cache_session.c,181,&ses_mem_mtx)"
     0 Debug        -
 "MTX_UNLOCK(SES_New,cache_session.c,183,&ses_mem_mtx)"
     0 Debug        - "MTX_LOCK(WRK_Queue,cache_pool.c,242,&wq[u]->mtx)"
     0 Debug        - "MTX_UNLOCK(WRK_Queue,cache_pool.c,257,&wq[u]->mtx)"
  8014 SessionClose - dropped
  8014 StatSess     - (null) (null) 1248372198 0 0 0 0 0 0 0
     0 Debug        -
 "MTX_LOCK(SES_Delete,cache_session.c,231,&ses_mem_mtx)"
     0 Debug        -
 "MTX_UNLOCK(SES_Delete,cache_session.c,233,&ses_mem_mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
     0 Debug        -
 "MTX_UNLOCK(wrk_decimate_flock,cache_pool.c,367,&wq[u]->mtx)"
     0 Debug        -
 "MTX_LOCK(wrk_decimate_flock,cache_pool.c,354,&wq[u]->mtx)"
 }}}

 I will attach a graph that shows the sudden growth in number of threads.
 Only a full restart makes Varnish deliver normally again, with stuck
 connections and threads gone. The problem occurs on several servers, but
 somehow more often on this machine.

 PS: Yes, this is on finn.no, on the production cache servers. It happens
 after some hours, with at the current freeze 2,42 million objects and 40
 GB data in the cache.

-- 
Ticket URL: <http://varnish.projects.linpro.no/ticket/534>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator


More information about the varnish-bugs mailing list