Workspace overflow on ia32
Dmitry Panov
dmitry.panov at yahoo.co.uk
Wed Mar 9 21:18:34 CET 2011
Oops, my bad, I didn't realise the unpatched trunk installed itself in a
different directory, so I was in fact running a patched version. However
after I fixed this, I could still reproduce the problem. Here is the
updated stack trace:
Child (15077) died signal=6
Child (15077) Panic message: Assert error in http_Write(), cache_http.c
line 1063:
Condition((hp->hd[HTTP_HDR_STATUS].b) != 0) not true.
thread = (cache-worker)
ident = Linux,2.6.26-2-686,i686,-sfile,-smalloc,-hcritbit,epoll
Backtrace:
0x807eaa5: pan_backtrace+24
0x807ed4e: pan_ic+193
0x807b79f: http_Write+e6
0x8083beb: RES_WriteObj+1cb
0x805ec3f: cnt_deliver+5e6
0x8062dd6: CNT_Session+6ae
0x8081221: wrk_do_cnt_sess+160
0x80809af: wrk_thread_real+d36
0x8080e1c: wrk_thread+109
0xb76df955: _end+af60df25
sp = 0xb7493004 {
fd = 11, id = 11, xid = 1045926360,
client = 127.0.0.1 51657,
step = STP_DELIVER,
handling = deliver,
err_code = 200, err_reason = (null),
restarts = 0, esi_level = 0
ws = 0xb7493054 {
id = "sess",
{s,f,r,e} = {0xb74937f4,+220,(nil),+16384},
},
http[req] = {
ws = 0xb7493054[sess]
"GET",
"/doc/dvd+rw-tools/",
"HTTP/1.0",
"Referer: http://localhost:6802/doc/",
"User-Agent: Wget/1.11.4",
"Accept: */*",
"Host: localhost:6802",
"Connection: Keep-Alive",
"X-Forwarded-For: 127.0.0.1",
},
worker = 0x6e28c0ec {
ws = 0x6e28c220 { overflow
id = "wrk",
{s,f,r,e} = {0x6e285fc0,+16384,(nil),+16384},
},
http[resp] = {
ws = 0x6e28c220[wrk]
"HTTP/1.1",
"OK",
"Server: Apache/2.2.9 (Debian) proxy_html/3.0.1",
"Last-Modified: Mon, 23 Jun 2008 14:32:23 GMT",
"ETag: "2222-fc35-450564fcbabc0"",
"Content-Type: text/html",
"Content-Length: 64565",
"Accept-Ranges: bytes",
"Via: 1.1 varnish",
},
},
vcl = {
srcname = {
"input",
"Default",
},
},
obj = 0x8fc91000 {
xid = 1045926360,
ws = 0x8fc91010 {
id = "obj",
{s,f,r,e} = {0x8fc91140,+228,(nil),+248},
},
http[obj] = {
ws = 0x8fc91010[obj]
"HTTP/1.1",
"OK",
"Date: Wed, 09 Mar 2011 20:12:14 GMT",
"Server: Apache/2.2.9 (Debian) proxy_html/3.0.1",
"Last-Modified: Mon, 23 Jun 2008 14:32:23 GMT",
"ETag: "2222-fc35-450564fcbabc0"",
"Content-Type: text/html",
"Content-Length: 64565",
},
len = 64565,
store = {
64565 {
3c 48 54 4d 4c 3e 0a 0a 3c 48 45 41 44 3e 0a 3c |<HTML>..<HEAD>.<|
42 41 53 45 20 48 52 45 46 3d 22 68 74 74 70 3a |BASE HREF="http:|
2f 2f 66 79 2e 63 68 61 6c 6d 65 72 73 2e 73 65 |//fy.chalmers.se|
2f 7e 61 70 70 72 6f 2f 6c 69 6e 75 78 2f 44 56 |/~appro/linux/DV|
[64501 more]
},
},
},
},
I've also disabled gzip support on the server which made no difference.
On 09/03/2011 19:03, Dmitry Panov wrote:
> Ok, I have reproduced the bug on the unpatched trunk (revision
> 25c5f2ed3229e41e99eadff57374c3a93b41a356) without using custom vcl
> (the only section I have there is the backend specification).
>
> Command to run varnish was:
>
> /opt/varnish/sbin/varnishd \
> -a 0.0.0.0:6802 \
> -f /opt/varnish/etc/varnish/my.vcl \
> -P /var/run/varnishd.pid \
> -T 127.0.0.1:2000 \
> -d \
> -s file,/opt/varnish/var/varnish/storage.bin,1G
>
> The system is running Debian with 32 bit kernel. As I mentioned
> earlier I was able to reproduce the problem on another machine with
> significantly different hardware configuration. The only common thing
> was that they were running debian with 32bit kernel. Also I used the
> same binaries on both machines. I could not reproduce the problem in
> 64 bit environment.
>
> I'm attaching the stack trace and the log file. Please let me know if
> I can provide any more info.
>
> On 09/03/2011 14:51, Geoff Simmons wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> On 03/ 9/11 03:17 PM, Dmitry Panov wrote:
>>> Just a heads up, I'm getting assertion failures when running a rather
>>> simple testcase: using local apache that serves /user/share/doc as the
>>> backend and running wget -r http://localhost:6802/doc Shortly after
>>> that
>>> the following errors start to appear:
>>>
>>> Child (11125) Panic message: Assert error in http_Write(), cache_http.c
>>> line 1181:
>>> Condition((hp->hd[HTTP_HDR_STATUS].b) != 0) not true.
>>> thread = (cache-worker)
>> Thanks for the heads up. Can you send over the whole stack trace?
>>
>>> I have been able to reproduce it on 2 different machines with very
>>> different hardware configurations which makes hardware problem quite
>>> unlikely. Also
>>>
>>> httperf --server localhost --port 6802 --uri / --num-conns 1
>>> --num-calls 4000
>>>
>>> runs without a problem.
>>>
>>> These 2 machines both run 32bit linux kernel. I haven't been able to
>>> reproduce the problem in a 64bit environment.
>> Could be running out of workspace. I fixed a similar error during the
>> course of development, which had to do with the fact that sufficient
>> workspace has to be allocated for the both backend response *and* the
>> stale object; you might have found something related. Also, I've only
>> been testing with 64 bit; looks like I better test 32 bit as well.
>>
>> Is there any way you can send the request& response that are being
>> processed when the error happens?
>>
>> And what if you set --num-conns high and --num-calls low, say 400
>> connections and 10 calls per connection? Or keep setting --num-conns
>> higher, to see if you can provoke the error? I've been running httperf
>> with 25,000 connections and 1000 calls per connection, found a memory
>> leak that way.
>>
>>> Unfortunately I haven't got time to try the unpatched trunk (I tried it
>>> with revisions 3 and 4 of the patch) or do any further experiments but
>>> I'll try to do so in the next couple of days and then post more
>>> details.
>> It's a good idea to test on the unpatched trunk as well, to make sure
>> that the bug really comes from the patch.
>>
>> Thanks very much for the feedback!
>>
>>
>
Best regards,
--
Dmitry Panov
More information about the varnish-dev
mailing list