[Varnish] #482: Polling broken on Solaris?
Varnish
varnish-bugs at projects.linpro.no
Sat Apr 4 01:52:21 CEST 2009
#482: Polling broken on Solaris?
----------------------+-----------------------------------------------------
Reporter: whocares | Type: defect
Status: new | Priority: normal
Milestone: | Component: build
Version: trunk | Severity: normal
Keywords: |
----------------------+-----------------------------------------------------
I don't know how to best describe this so if you need more data, just let
me know.
OS: Solaris 10, latest patches applied as of March 02, 2009 \
Compiler: Sun Studio 12
The problem is that after installing varnish 2.0.4 we saw a massive
increase in 503 errors. After changing the code to return different error
codes for the places where varnishd sets the 503 by force internally I
could track the problem down to the function `cnt_fetch` in
`bin/varnishd/cache_center.c`.
Further investigation led me the function `Fetch` in
`bin/varnishd/cache_fetch.c`. Our problem is always triggered in line 384
thereof.
I suspect the real source is in `HTC_Rx` but neither my programming skills
nor my debugging skills did allow me to get any further.
While trying to reproduce the problem so I could present you with a solid
test case, I found that running the tests provided with `varnishtest` will
sometimes run through wihtout any problems and sometimes will fail with
{{{
Assert error in http_rxchar(), vtc_http.c line 343:
Condition(i > 0) not true.
Abort (core dumped)
}}}
A very relyable way to reproduce this here is to run `r00345.vtc` a couple
of times. I never needed more than three runs to get to the core dump.
I also found that the behaviour is not related to any specific URL. On the
live website I'd get an 503 once but after manually reloading the page in
the browser it load just fine.
So as a temporary fix I changed the code in `cache_center.c` like this
(starting from line 400):
{{{
// if (i) {
// sp->err_code = 503;
// sp->step = STP_ERROR;
// VBE_free_bereq(&sp->bereq);
// HSH_Drop(sp);
// AZ(sp->obj);
// return (0);
// }
//
// RFC2616_cache_policy(sp, sp->obj->http); /* XXX -> VCL */
//
// sp->err_code = http_GetStatus(sp->obj->http);
// VCL_fetch_method(sp);
if (i) {
sp->handling = VCL_RET_RESTART;
} else {
RFC2616_cache_policy(sp, sp->obj->http);
sp->err_code = http_GetStatus(sp->obj->http);
VCL_fetch_method(sp);
}
}}}
This will restart the request and fixes 99% of our problem. The last 1%
are cases where we run into "max restarts reached", probably because of
hitting the same problem more than 4 times in a row.
As said initially: I don't know how to better describe the problem, if
you've got any pointers as to how to provide better information, just let
me know. I can even provide access to the "live thing" and a development
environment on Solaris if need be.
--
Ticket URL: <http://varnish.projects.linpro.no/ticket/482>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list