Workspace overflow on ia32

Dmitry Panov dmitry.panov at yahoo.co.uk
Wed Mar 9 21:18:34 CET 2011


Oops, my bad, I didn't realise the unpatched trunk installed itself in a 
different directory, so I was in fact running a patched version. However 
after I fixed this, I could still reproduce the problem. Here is the 
updated stack trace:

Child (15077) died signal=6
Child (15077) Panic message: Assert error in http_Write(), cache_http.c 
line 1063:
   Condition((hp->hd[HTTP_HDR_STATUS].b) != 0) not true.
thread = (cache-worker)
ident = Linux,2.6.26-2-686,i686,-sfile,-smalloc,-hcritbit,epoll
Backtrace:
   0x807eaa5: pan_backtrace+24
   0x807ed4e: pan_ic+193
   0x807b79f: http_Write+e6
   0x8083beb: RES_WriteObj+1cb
   0x805ec3f: cnt_deliver+5e6
   0x8062dd6: CNT_Session+6ae
   0x8081221: wrk_do_cnt_sess+160
   0x80809af: wrk_thread_real+d36
   0x8080e1c: wrk_thread+109
   0xb76df955: _end+af60df25
sp = 0xb7493004 {
   fd = 11, id = 11, xid = 1045926360,
   client = 127.0.0.1 51657,
   step = STP_DELIVER,
   handling = deliver,
   err_code = 200, err_reason = (null),
   restarts = 0, esi_level = 0
   ws = 0xb7493054 {
     id = "sess",
     {s,f,r,e} = {0xb74937f4,+220,(nil),+16384},
   },
   http[req] = {
     ws = 0xb7493054[sess]
       "GET",
       "/doc/dvd+rw-tools/",
       "HTTP/1.0",
       "Referer: http://localhost:6802/doc/",
       "User-Agent: Wget/1.11.4",
       "Accept: */*",
       "Host: localhost:6802",
       "Connection: Keep-Alive",
       "X-Forwarded-For: 127.0.0.1",
   },
   worker = 0x6e28c0ec {
     ws = 0x6e28c220 { overflow
       id = "wrk",
       {s,f,r,e} = {0x6e285fc0,+16384,(nil),+16384},
     },
     http[resp] = {
       ws = 0x6e28c220[wrk]
         "HTTP/1.1",
         "OK",
         "Server: Apache/2.2.9 (Debian) proxy_html/3.0.1",
         "Last-Modified: Mon, 23 Jun 2008 14:32:23 GMT",
         "ETag: "2222-fc35-450564fcbabc0"",
         "Content-Type: text/html",
         "Content-Length: 64565",
         "Accept-Ranges: bytes",
         "Via: 1.1 varnish",
     },
     },
     vcl = {
       srcname = {
         "input",
         "Default",
       },
     },
   obj = 0x8fc91000 {
     xid = 1045926360,
     ws = 0x8fc91010 {
       id = "obj",
       {s,f,r,e} = {0x8fc91140,+228,(nil),+248},
     },
     http[obj] = {
       ws = 0x8fc91010[obj]
         "HTTP/1.1",
         "OK",
         "Date: Wed, 09 Mar 2011 20:12:14 GMT",
         "Server: Apache/2.2.9 (Debian) proxy_html/3.0.1",
         "Last-Modified: Mon, 23 Jun 2008 14:32:23 GMT",
         "ETag: "2222-fc35-450564fcbabc0"",
         "Content-Type: text/html",
         "Content-Length: 64565",
     },
     len = 64565,
     store = {
       64565 {
         3c 48 54 4d 4c 3e 0a 0a 3c 48 45 41 44 3e 0a 3c |<HTML>..<HEAD>.<|
         42 41 53 45 20 48 52 45 46 3d 22 68 74 74 70 3a |BASE HREF="http:|
         2f 2f 66 79 2e 63 68 61 6c 6d 65 72 73 2e 73 65 |//fy.chalmers.se|
         2f 7e 61 70 70 72 6f 2f 6c 69 6e 75 78 2f 44 56 |/~appro/linux/DV|
         [64501 more]
       },
     },
   },
},

I've also disabled gzip support on the server which made no difference.


On 09/03/2011 19:03, Dmitry Panov wrote:
> Ok, I have reproduced the bug on the unpatched trunk (revision 
> 25c5f2ed3229e41e99eadff57374c3a93b41a356) without using custom vcl 
> (the only section I have there is the backend specification).
>
> Command to run varnish was:
>
> /opt/varnish/sbin/varnishd \
>     -a 0.0.0.0:6802 \
>     -f /opt/varnish/etc/varnish/my.vcl \
>     -P /var/run/varnishd.pid \
>     -T 127.0.0.1:2000 \
>     -d \
>     -s file,/opt/varnish/var/varnish/storage.bin,1G
>
> The system is running Debian with 32 bit kernel. As I mentioned 
> earlier I was able to reproduce the problem on another machine with 
> significantly different hardware configuration. The only common thing 
> was that they were running debian with 32bit kernel. Also I used the 
> same binaries on both machines. I could not reproduce the problem in 
> 64 bit environment.
>
> I'm attaching the stack trace and the log file. Please let me know if 
> I can provide any more info.
>
> On 09/03/2011 14:51, Geoff Simmons wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> On 03/ 9/11 03:17 PM, Dmitry Panov wrote:
>>> Just a heads up, I'm getting assertion failures when running a rather
>>> simple testcase: using local apache that serves /user/share/doc as the
>>> backend and running wget -r http://localhost:6802/doc Shortly after 
>>> that
>>> the following errors start to appear:
>>>
>>> Child (11125) Panic message: Assert error in http_Write(), cache_http.c
>>> line 1181:
>>>    Condition((hp->hd[HTTP_HDR_STATUS].b) != 0) not true.
>>> thread = (cache-worker)
>> Thanks for the heads up. Can you send over the whole stack trace?
>>
>>> I have been able to reproduce it on 2 different machines with very
>>> different hardware configurations which makes hardware problem quite
>>> unlikely. Also
>>>
>>> httperf --server localhost --port 6802 --uri /  --num-conns 1
>>> --num-calls 4000
>>>
>>> runs without a problem.
>>>
>>> These 2 machines both run 32bit linux kernel. I haven't been able to
>>> reproduce the problem in a 64bit environment.
>> Could be running out of workspace. I fixed a similar error during the
>> course of development, which had to do with the fact that sufficient
>> workspace has to be allocated for the both backend response *and* the
>> stale object; you might have found something related. Also, I've only
>> been testing with 64 bit; looks like I better test 32 bit as well.
>>
>> Is there any way you can send the request&  response that are being
>> processed when the error happens?
>>
>> And what if you set --num-conns high and --num-calls low, say 400
>> connections and 10 calls per connection? Or keep setting --num-conns
>> higher, to see if you can provoke the error? I've been running httperf
>> with 25,000 connections and 1000 calls per connection, found a memory
>> leak that way.
>>
>>> Unfortunately I haven't got time to try the unpatched trunk (I tried it
>>> with revisions 3 and 4 of the patch) or do any further experiments but
>>> I'll try to do so in the next couple of days and then post more 
>>> details.
>> It's a good idea to test on the unpatched trunk as well, to make sure
>> that the bug really comes from the patch.
>>
>> Thanks very much for the feedback!
>>
>>
>

Best regards,

-- 
Dmitry Panov




More information about the varnish-dev mailing list