[Varnish] #1083: Persistent Varnish crashes since using bans and lurker

Varnish varnish-bugs at varnish-cache.org
Sun Apr 14 12:30:32 CEST 2013


#1083: Persistent Varnish crashes since using bans and lurker
-------------------------+---------------------
 Reporter:  rmohrbacher  |       Owner:  martin
     Type:  defect       |      Status:  new
 Priority:  high         |   Milestone:
Component:  varnishd     |     Version:  3.0.2
 Severity:  major        |  Resolution:
 Keywords:               |
-------------------------+---------------------

Comment (by numard):

 I can confirm this happened on 3.0.2-1~1lucid1 (once every ~ 8 hours ). I
 upgraded to to 3.0.3-1~precise , and it happens also, but it seems, so
 far, less often (~ 18 hours ).

 We have 2 x servers with similar usage pattern as @mohrbacher's :
  - file storage
  - no issues for a long time
  - we started pushing a lot more bans, and the issues started to happen.

 Varnish (3.0.3-1~precise package from http://repo.varnish-
 cache.org/ubuntu/, ubuntu Precise 12.0.4 LTS ) is acting as a cache for S3
 objects. It runs as :
 {{{
 /usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -p thread_pool_min 200
 -p thread_pool_max 4000 -p thread_pool_add_delay 2 -p http_req_hdr_len
 10240 -p http_req_size 65536 -p first_byte_timeout 300 -T localhost:6082
 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s
 persistent,/mnt/varnish_store,360G
 }}}
 Running on AWS, m1.medium, no apparent constraints on memory, none on cpu
 nor i/o.

 When child process dies, panic.list shows:


 {{{
 varnish> panic.show
 200
 Last panic at: Sun, 14 Apr 2013 09:57:34 GMT
 Missing errorhandling code in smp_append_sign(), storage_persistent_subr.c
 line 128:
   Condition((smp_chk_sign(ctx)) == 0) not true.thread = (cache-worker)
 ident =
 Linux,3.2.0-40-virtual,x86_64,-spersistent,-smalloc,-hcritbit,epoll
 Backtrace:
   0x4310e5: /usr/sbin/varnishd() [0x4310e5]
   0x4514d8: /usr/sbin/varnishd(smp_append_sign+0x128) [0x4514d8]
   0x44f1da: /usr/sbin/varnishd(SMP_NewBan+0x3a) [0x44f1da]
   0x4158d2: /usr/sbin/varnishd(BAN_Insert+0x1a2) [0x4158d2]
   0x439fa8: /usr/sbin/varnishd(VRT_ban_string+0xb8) [0x439fa8]
   0x7f6391ef60c7: ./vcl.LQXRTnfB.so(+0x20c7) [0x7f6391ef60c7]
   0x437f48: /usr/sbin/varnishd(VCL_recv_method+0x48) [0x437f48]
   0x41946b: /usr/sbin/varnishd(CNT_Session+0xf2b) [0x41946b]
   0x432ee5: /usr/sbin/varnishd() [0x432ee5]
   0x7fbd9bb5de9a: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)
 [0x7fbd9bb5de9a]
 sp = 0x7f62c8cda008 {
   fd = 12, id = 12, xid = 1800342971,
   client = 10.32.37.110 49187,
   step = STP_RECV,
   handling = deliver,
   restarts = 0, esi_level = 0
   flags =
   bodystatus = 4
   ws = 0x7f62c8cda080 {
     id = "sess",
     {s,f,r,e} = {0x7f62c8cdac78,+168,(nil),+65536},
   },
   http[req] = {
     ws = 0x7f62c8cda080[sess]
       "BAN",
       "/xxxxs3bucketxxxx/path1/key2/key3",
       "HTTP/1.1",
       "Accept: */*",
       "host: s3.amazonaws.com",
   },
   worker = 0x7f632d629ac0 {
     ws = 0x7f632d629cf8 {
       id = "wrk",
       {s,f,r,e} = {0x7f632d617a50,+56,(nil),+65536},
     },
     },
     vcl = {
       srcname = {
         "input",
         "Default",
       },
     },
 },

 }}}

 -----

 Both servers get each ban request needed (they are behind load balancers
 with non-deterministic choosing of the varnish server), but the url shown
 in the panic dumps are different (though of the same 'type' - if it
 matters i can show examples).

 I'm willing to test a patch on production ASAP if it exists...

 Cheers,
 Beto

-- 
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1083#comment:3>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator




More information about the varnish-bugs mailing list