503 errors on POST

Thu Jan 6 16:54:00 CET 2011

On Tue, Dec 21, 2010 at 08:48:07AM +0100, Modesto Alexandre wrote:
> Le mardi 21 décembre 2010, Flavio Torres a écrit :
> > On 12/20/2010 07:58 AM, Modesto Alexandre wrote:
> > > Here are the errors I have (I have masked the private information):
> > > 
> > > http://demo.ovh.net/view/6ac24fbc5400039bc86fd3444556ef76/0.colored

This was a 404 for me?

> Backends look good :
> 
> varnishadm -T localhost:6082 debug.health
> Backend backend1 is Healthy
> Current states  good: 10 threshold:  8 window: 10
> Average responsetime of good probes: 0.144168
> Oldest                                                    Newest
> ================================================================
> 4444444444444444444444444444444444444444444444444444444444444444 Good IPv4
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Good Xmit
> RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR- Good Recv
> HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH- Happy

It just had a failed request - so not really good.

> Configuration of backend pooling :
> 
> backend backend1 {
> .host = "ip.ip.ip.ip";
> .port = "80";
> .connect_timeout = 300s;

Connection timeout is pretty much only needed to allow geopgraphically
distributed servers. Keep in mind that the application doesn't have to
respond for the connection to be established: this is usually done by the
operating system and is usually VERY fast.

I did some quick math [1]: In 300 seconds, a packet can travel around the
earth roughly 2000 times, assuming it's using mostly fiber and going around
equator. Unless your web server is on a different planet (Venus is
possible, but Mars is out of range I'm afraid) -  your connection timeout
is dangerous.

Rule of thumb: If you are increasing default values by 10 000% or more:
Think twice. Then don't do it.

> .first_byte_timeout = 300s;
> .between_bytes_timeout = 300s;

Those are semi-fine - but still rather long (how slow is the application?).

>  .probe = {
>                 .url = "/url.gif";

I recommend polling something that actually tests more than basic HTTP
functionality. Typically I set up a poll against the application that needs
to be tested and make sure the health check URL tests/probes any relevant
resources (ie: do some simple database query, for example).

> any idea ?

Can you post varnishlog and VCL?

Unfortunately, health checks only catch reasonably consistent errors. In
your case, it would take about 10 seconds of consistent errors before the
health checks would kick in and Varnish stop using a back end.

For sporadic errors, that doesn't help you much. In this case, we already
saw a sporadic error in the health checks.

You may also want to take a look at the timer-values of ReqEnd to debug
this. It will indicate the average response time. Looking at the Debug
header might be useful too.

But it will be much easier to analyse this with VCL and varnishlog.

[1] (300s*(3m/s*10^8)/40075160m = 2245

- Kristian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: Digital signature
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20110106/f7190493/attachment-0003.pgp>