varnish 2.0.4 backend errors

Lazy lazy404 at gmail.com
Tue Jul 14 11:46:58 CEST 2009


2009/7/14 Kristian Lyngstol <kristian at redpill-linpro.com>:
> On Sat, Jul 11, 2009 at 12:21:38AM +0200, Lazy wrote:
>> We are having hard time figuring out what's cosing varnish 503 error,
>> our backend is apache is debian 5 default, os is linux x86_64 2.6.26,
>> everything is running on a single machine
>>
>> /usr/local/sbin/varnishd -a 0.0.0.0:80 -f
>> /usr/local/etc/varnish/default.vcl -s malloc -T localhost:9999 -w
>> 10,6000,300 -u nobody
>
> 6000 threads is too much. Since it's per pool, it'll cause up to 12 000
> threads to start. That's not likely to go over all that well. If you have
> that sort of traffic, you need to scale out. Also, 10 thread minimum is
> pretty low.
>
> I typically recommend setting the minimum thread count to what you expect
> your normal traffic to be at peak hours. It's probably a dedicated
> machines, and idle threads have barely any overhead, while creating new
> threads can take some time.

at first i had 3000 threads set and varnish ocassionly droped
connections, so I doubled it

so what whould be a recomended values ?
will -w 1024,1024 -p thread_pools=6 whould be ok ?

the site is usually not so busy, but it has sometimes spikes of static
traffic (about 50Mbps) that's why i upped the thread limit, 3000 was
to low

is it safe to change thread_pools on runtime ?

>
>> running with a single backend
>> .connect_timeout = 1s; added to the backend definition
>
> Any particular reason for adding that?

originally it wasn't there i added it trying to go around the issue

>
>> I added
>>
>> sub vcl_error {
>>     if (req.restarts < 10) {
>>         restart;
>>     }
>> }
>>
>> (is it possible to add a pause before doing restart ?)
>
> No. This is also a dirty workaround for a fundamental problem.
>
>> In about 0.1% of request we get
>>
>>    10 TxRequest    b POST
>>    10 TxURL        b /php
>>    10 TxProtocol   b HTTP/1.1
>>    10 TxHeader     b x-requested-with: XMLHttpRequest
>>    10 TxHeader     b Accept-Language: pl
>>    10 TxHeader     b Referer: http://www.xxxxx/php
>>    10 TxHeader     b Accept: text/html, */*
>>    10 TxHeader     b Content-Type: application/x-www-form-urlencoded
>>    10 TxHeader     b UA-CPU: x86
>>    10 TxHeader     b Accept-Encoding: gzip, deflate
>>    10 TxHeader     b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;
>> Windows NT 5.1)
>>    10 TxHeader     b Content-Length: 8
>>    10 TxHeader     b Cookie: _.1
>>    10 TxHeader     b X-NovINet: v1.2
>>    10 TxHeader     b X-Varnish: 603437812
>>    10 TxHeader     b X-Forwarded-For: 79.162.xxx
>>    10 BackendClose b default
>>    31 VCL_call     c error
>>    31 VCL_return   c deliver
>>    31 Length       c 465
>>    31 VCL_call     c deliver
>>    31 VCL_return   c deliver
>>    31 TxProtocol   c HTTP/1.1
>>    31 TxStatus     c 503
>>
>> machine is not overloaded, there are 150 apache running 80% of them is idle
>>
>> what does
>> 31 VCL_call     c error mean , a connection error, apache returned
>> invalid response ?
>
> No, it just means that vcl_error is called. BackendClose notes that the
> connection to the backend was closed.
>
>> can I get some more information about this error using some syslog in
>> vcl_error or mayby in some other way ?
>
> Possibly, but using syslog in vcl is the last thing I'd recommend.
>
> Does your syslog say anything meaningful? Like assert-errors...
no, only info about admin commands

> (...)
>>        60064  Backend connections failures
>> this is old and it's not changing now
>
> Did the error-rate go down once you solved this? What was causing these
> problems?
it was related to load testing, in production it went away when i
upped maxclients on apache


>
>>           20  N worker threads
>>         4152  N worker threads created
>>            0  N worker threads not created
>>            0  N worker threads limited
>>            0  N queued work requests
>>       226847  N overflowed work requests
>
> This is what I mean with -w 10,6000 being wrong. After the initial startup,
> overflowed work requests shouldn't grow much, and you're currently running
> at only 20 threads (the minimum), which will cause overflows very fast
> (consider how many connections a single client will use to fetch a front
> page... You can easily imagine overflowing with just 3-4 concurrent
> clients.)
>
> But that's not really causing any 503s. Just delays while threads are
> created (and removed).

tcpdump of another 503 (apache is running on port 88),

11:09:50.187842 IP x.x.x.x.50780 > x.x.x.x.88: S 88526893:88526893(0)
win 32792 <mss 16396,sackOK,timestamp 532825309 0,nop,wscale 7>
11:09:50.187851 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0)
ack 88526894 win 32768 <mss 16396,sackOK,timestamp 532825309
532825309,nop,wscale 7>
11:09:50.187867 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257
<nop,nop,timestamp 532825309 532825309>
11:09:53.187730 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0)
ack 88526894 win 32768 <mss 16396,sackOK,timestamp 532826059
532825309,nop,wscale 7>
11:09:53.187740 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257
<nop,nop,timestamp 532826059 532826059,nop,nop,sack 1 {0:1}>
11:09:59.191730 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0)
ack 88526894 win 32768 <mss 16396,sackOK,timestamp 532827559
532826059,nop,wscale 7>
11:09:59.191744 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257
<nop,nop,timestamp 532827559 532827559,nop,nop,sack 1 {0:1}>
11:10:05.187748 IP x.x.x.x.50780 > x.x.x.x.88: P 1:918(917) ack 1 win
257 <nop,nop,timestamp 532829059 532827559>
11:10:05.187766 IP x.x.x.x.88 > x.x.x.x.50780: . ack 918 win 271
<nop,nop,timestamp 532829059 532829059>
11:10:05.187799 IP x.x.x.x.50780 > x.x.x.x.88: F 918:918(0) ack 1 win
257 <nop,nop,timestamp 532829059 532829059>
11:10:05.190887 IP x.x.x.x.88 > x.x.x.x.50780: P 1:2968(2967) ack 919
win 271 <nop,nop,timestamp 532829059 532829059>
11:10:05.190909 IP x.x.x.x.50780 > x.x.x.x.88: R 88527812:88527812(0) win 0


x.x.x.x is a local address bound to eth0


Thank You for your help.

-- 
Michal Grzedzicki



More information about the varnish-misc mailing list