[Varnish] #1083: Persistent Varnish crashes since using bans and lurker
Varnish
varnish-bugs at varnish-cache.org
Sun Apr 14 12:30:32 CEST 2013
#1083: Persistent Varnish crashes since using bans and lurker
-------------------------+---------------------
Reporter: rmohrbacher | Owner: martin
Type: defect | Status: new
Priority: high | Milestone:
Component: varnishd | Version: 3.0.2
Severity: major | Resolution:
Keywords: |
-------------------------+---------------------
Comment (by numard):
I can confirm this happened on 3.0.2-1~1lucid1 (once every ~ 8 hours ). I
upgraded to to 3.0.3-1~precise , and it happens also, but it seems, so
far, less often (~ 18 hours ).
We have 2 x servers with similar usage pattern as @mohrbacher's :
- file storage
- no issues for a long time
- we started pushing a lot more bans, and the issues started to happen.
Varnish (3.0.3-1~precise package from http://repo.varnish-
cache.org/ubuntu/, ubuntu Precise 12.0.4 LTS ) is acting as a cache for S3
objects. It runs as :
{{{
/usr/sbin/varnishd -P /var/run/varnishd.pid -a :80 -p thread_pool_min 200
-p thread_pool_max 4000 -p thread_pool_add_delay 2 -p http_req_hdr_len
10240 -p http_req_size 65536 -p first_byte_timeout 300 -T localhost:6082
-f /etc/varnish/default.vcl -S /etc/varnish/secret -s
persistent,/mnt/varnish_store,360G
}}}
Running on AWS, m1.medium, no apparent constraints on memory, none on cpu
nor i/o.
When child process dies, panic.list shows:
{{{
varnish> panic.show
200
Last panic at: Sun, 14 Apr 2013 09:57:34 GMT
Missing errorhandling code in smp_append_sign(), storage_persistent_subr.c
line 128:
Condition((smp_chk_sign(ctx)) == 0) not true.thread = (cache-worker)
ident =
Linux,3.2.0-40-virtual,x86_64,-spersistent,-smalloc,-hcritbit,epoll
Backtrace:
0x4310e5: /usr/sbin/varnishd() [0x4310e5]
0x4514d8: /usr/sbin/varnishd(smp_append_sign+0x128) [0x4514d8]
0x44f1da: /usr/sbin/varnishd(SMP_NewBan+0x3a) [0x44f1da]
0x4158d2: /usr/sbin/varnishd(BAN_Insert+0x1a2) [0x4158d2]
0x439fa8: /usr/sbin/varnishd(VRT_ban_string+0xb8) [0x439fa8]
0x7f6391ef60c7: ./vcl.LQXRTnfB.so(+0x20c7) [0x7f6391ef60c7]
0x437f48: /usr/sbin/varnishd(VCL_recv_method+0x48) [0x437f48]
0x41946b: /usr/sbin/varnishd(CNT_Session+0xf2b) [0x41946b]
0x432ee5: /usr/sbin/varnishd() [0x432ee5]
0x7fbd9bb5de9a: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)
[0x7fbd9bb5de9a]
sp = 0x7f62c8cda008 {
fd = 12, id = 12, xid = 1800342971,
client = 10.32.37.110 49187,
step = STP_RECV,
handling = deliver,
restarts = 0, esi_level = 0
flags =
bodystatus = 4
ws = 0x7f62c8cda080 {
id = "sess",
{s,f,r,e} = {0x7f62c8cdac78,+168,(nil),+65536},
},
http[req] = {
ws = 0x7f62c8cda080[sess]
"BAN",
"/xxxxs3bucketxxxx/path1/key2/key3",
"HTTP/1.1",
"Accept: */*",
"host: s3.amazonaws.com",
},
worker = 0x7f632d629ac0 {
ws = 0x7f632d629cf8 {
id = "wrk",
{s,f,r,e} = {0x7f632d617a50,+56,(nil),+65536},
},
},
vcl = {
srcname = {
"input",
"Default",
},
},
},
}}}
-----
Both servers get each ban request needed (they are behind load balancers
with non-deterministic choosing of the varnish server), but the url shown
in the panic dumps are different (though of the same 'type' - if it
matters i can show examples).
I'm willing to test a patch on production ASAP if it exists...
Cheers,
Beto
--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1083#comment:3>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list