Persistent Varnish crashes since using bans and lurker

Roland Mohrbacher roland at mohrbacher.eu
Tue Jan 10 11:11:27 CET 2012


Hello all,

we use a farm with three persistent Varnishes (-s 
persistent,/cms/varnish_cache/persistent/varnish_storage.bin,204800M").

This Varnishes runs since 3 months without any crashes (in the moment 
not in production, but stressed with several stress tests).

Since some days, we use bans and the lurker process (lurker-friendly 
bans via:  ban("obj.http.x-url ~ " + req.url);
We have about 250 bans/hour.

Now we have the big problem, that the varnishes crashes after some hours.
Curios: all three Varnishes crashes in the same moment. And they runs on 
three different Servers!

The follow part from syslog suggest, that there is an problem with an 
invalid ban:


Jan  9 19:40:32 ece-fe1 /var/lib/varnish/persistent[19622]: Child 
(19623) said CHK(0x7f91ffd261a0 BAN 2 0x7f34724f4000 <invalid>) = 1
Jan  9 19:40:33 ece-fe1 /var/lib/varnish/persistent[19622]: Child 
(19623) died signal=6
Jan  9 19:40:33 ece-fe1 /var/lib/varnish/persistent[19622]: Child 
(19623) Panic message: Missing errorhandling code in smp_append_sign(), 
storage_persistent_subr.c line 128:#012  Condition((smp_chk_sign(ctx)) 
== 0) not true.thread = (cache-worker)#012ident = 
Linux,2.6.32-131.2.1.el6.x86_64,x86_64,-spersistent,-smalloc,-hcritbit,epoll#012Backtrace:#012  
0x42c7a6: /usr/sbin/varnishd() [0x42c7a6]#012  0x44a346: 
/usr/sbin/varnishd(smp_append_sign+0x126) [0x44a346]#012  0x447b6d: 
/usr/sbin/varnishd(SMP_NewBan+0x3d) [0x447b6d]#012  0x4125c7: 
/usr/sbin/varnishd(BAN_Insert+0x1a7) [0x4125c7]#012  0x433bd5: 
/usr/sbin/varnishd(VRT_ban_string+0xc5) [0x433bd5]#012  0x7f91f39fa4be: 
./vcl.PNU3fGhs.so(+0x24be) [0x7f91f39fa4be]#012  0x433863: 
/usr/sbin/varnishd(VCL_recv_method+0x43) [0x433863]#012  0x417c22: 
/usr/sbin/varnishd(CNT_Session+0xb62) [0x417c22]#012  0x42efb8: 
/usr/sbin/varnishd() [0x42efb8]#012  0x42e19b: /usr/sbin/varnishd() 
[0x42e19b]#012sp = 0x7f91ed4ab008 {#012  fd = 15, id = 15, xid = 
683670119,#012  client = 172.27.70.103 36115,#012  step = STP_RECV,#012  
handling = deliver,#012  restarts = 0, esi_level = 0#012  flags = #012  
bodystatus = 4#012  ws = 0x7f91ed4ab080 { #012    id = "sess",#012    
{s,f,r,e} = {0x7f91ed4abc90,+56,(nil),+65536},#012  },#012  http[req] = 
{#012    ws = 0x7f91ed4ab080[sess]#012      "PURGE",#012      
"105867846",#012      "HTTP/1.0",#012  },#012  worker = 0x7f91ef1faa80 
{#012    ws = 0x7f91ef1facc0 { #012      id = "wrk",#012      {s,f,r,e} 
= {0x7f91ef1e8a30,+32,(nil),+65536},#012    },#012    },#012    vcl = 
{#012      srcname = {#012        "input",#012        
"Default",#012      },#012    },#012},#012
Jan  9 19:40:33 ece-fe1 /var/lib/varnish/persistent[19622]: child (6907) 
Started
Jan  9 19:40:33 ece-fe1 /var/lib/varnish/persistent[19622]: Pushing vcls 
failed:#012CLI communication error (hdr)
Jan  9 19:40:33 ece-fe1 /var/lib/varnish/persistent[19622]: Child (6907) 
died signal=6
Jan  9 19:40:33 ece-fe1 /var/lib/varnish/persistent[19622]: Child (6907) 
Panic message: Assert error in smp_open(), storage_persistent.c line 
320:#012  Condition((smp_valid_silo(sc)) == 0) not true.#012thread = 
(cache-main)#012ident = 
Linux,2.6.32-131.2.1.el6.x86_64,x86_64,-spersistent,-smalloc,-hcritbit,no_waiter#012Backtrace:#012  
0x42c7a6: /usr/sbin/varnishd() [0x42c7a6]#012  0x44756a: 
/usr/sbin/varnishd() [0x44756a]#012  0x444d57: 
/usr/sbin/varnishd(STV_open+0x27) [0x444d57]#012  0x42b525: 
/usr/sbin/varnishd(child_main+0xc5) [0x42b525]#012  0x43d5ec: 
/usr/sbin/varnishd() [0x43d5ec]#012  0x43de7c: /usr/sbin/varnishd() 
[0x43de7c]#012  0x7f92015684c7: 
/usr/lib64/varnish/libvarnish.so(+0x94c7) [0x7f92015684c7]#012  
0x7f9201568b58: /usr/lib64/varnish/libvarnish.so(vev_schedule+0x88) 
[0x7f9201568b58]#012  0x43d7c2: /usr/sbin/varnishd(MGT_Run+0x132) 
[0x43d7c2]#012  0x44cacb: /usr/sbin/varnishd(main+0xd1b) [0x44cacb]#012
Jan  9 19:40:33 ece-fe1 /var/lib/varnish/persistent[19622]: Child (-1) 
said Child starts
Jan  9 19:40:33 ece-fe1 /var/lib/varnish/persistent[19622]: Child (-1) 
said CHK(0x7f91ffd26120 BAN 1 0x7f34723f4000 BAN 1) = 4
Jan  9 19:40:33 ece-fe1 /var/lib/varnish/persistent[19622]: Child (-1) 
said CHK(0x7f91ffd261a0 BAN 2 0x7f34724f4000 <invalid>) = 1


Is this an known problem?
Are there work a rounds to use persistent Varnish together with lurkers?

Best regards
Roland




More information about the varnish-misc mailing list