[Varnish] #482: Polling broken on Solaris?

Varnish varnish-bugs at projects.linpro.no
Sat Apr 4 01:52:21 CEST 2009


#482: Polling broken on Solaris?
----------------------+-----------------------------------------------------
 Reporter:  whocares  |        Type:  defect
   Status:  new       |    Priority:  normal
Milestone:            |   Component:  build 
  Version:  trunk     |    Severity:  normal
 Keywords:            |  
----------------------+-----------------------------------------------------
 I don't know how to best describe this so if you need more data, just let
 me know.

 OS: Solaris 10, latest patches applied as of March 02, 2009 \
 Compiler: Sun Studio 12

 The problem is that after installing varnish 2.0.4 we saw a massive
 increase in 503 errors. After changing the code to return different error
 codes for the places where varnishd sets the 503 by force internally I
 could track the problem down to the function `cnt_fetch` in
 `bin/varnishd/cache_center.c`.

 Further investigation led me the function `Fetch` in
 `bin/varnishd/cache_fetch.c`. Our problem is always triggered in line 384
 thereof.

 I suspect the real source is in `HTC_Rx` but neither my programming skills
 nor my debugging skills did allow me to get any further.

 While trying to reproduce the problem so I could present you with a solid
 test case, I found that running the tests provided with `varnishtest` will
 sometimes run through wihtout any problems and sometimes will fail with

 {{{
 Assert error in http_rxchar(), vtc_http.c line 343:
   Condition(i > 0) not true.
 Abort (core dumped)
 }}}

 A very relyable way to reproduce this here is to run `r00345.vtc` a couple
 of times. I never needed more than three runs to get to the core dump.

 I also found that the behaviour is not related to any specific URL. On the
 live website I'd get an 503 once but after manually reloading the page in
 the browser it load just fine.
 So as a temporary fix I changed the code in `cache_center.c` like this
 (starting from line 400):

 {{{
 //      if (i) {
 //              sp->err_code = 503;
 //              sp->step = STP_ERROR;
 //              VBE_free_bereq(&sp->bereq);
 //              HSH_Drop(sp);
 //              AZ(sp->obj);
 //              return (0);
 //      }
 //
 //      RFC2616_cache_policy(sp, sp->obj->http);        /* XXX -> VCL */
 //
 //      sp->err_code = http_GetStatus(sp->obj->http);
 //      VCL_fetch_method(sp);

         if (i) {
                 sp->handling = VCL_RET_RESTART;
         } else {
                 RFC2616_cache_policy(sp, sp->obj->http);
                 sp->err_code = http_GetStatus(sp->obj->http);
                 VCL_fetch_method(sp);
         }
 }}}

 This will restart the request and fixes 99% of our problem. The last 1%
 are cases where we run into "max restarts reached", probably because of
 hitting the same problem more than 4 times in a row.

 As said initially: I don't know how to better describe the problem, if
 you've got any pointers as to how to provide better information, just let
 me know. I can even provide access to the "live thing" and a development
 environment on Solaris if need be.

-- 
Ticket URL: <http://varnish.projects.linpro.no/ticket/482>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator


More information about the varnish-bugs mailing list