varnish 2.0.4 backend errors
Lazy
lazy404 at gmail.com
Tue Jul 14 11:46:58 CEST 2009
2009/7/14 Kristian Lyngstol <kristian at redpill-linpro.com>:
> On Sat, Jul 11, 2009 at 12:21:38AM +0200, Lazy wrote:
>> We are having hard time figuring out what's cosing varnish 503 error,
>> our backend is apache is debian 5 default, os is linux x86_64 2.6.26,
>> everything is running on a single machine
>>
>> /usr/local/sbin/varnishd -a 0.0.0.0:80 -f
>> /usr/local/etc/varnish/default.vcl -s malloc -T localhost:9999 -w
>> 10,6000,300 -u nobody
>
> 6000 threads is too much. Since it's per pool, it'll cause up to 12 000
> threads to start. That's not likely to go over all that well. If you have
> that sort of traffic, you need to scale out. Also, 10 thread minimum is
> pretty low.
>
> I typically recommend setting the minimum thread count to what you expect
> your normal traffic to be at peak hours. It's probably a dedicated
> machines, and idle threads have barely any overhead, while creating new
> threads can take some time.
at first i had 3000 threads set and varnish ocassionly droped
connections, so I doubled it
so what whould be a recomended values ?
will -w 1024,1024 -p thread_pools=6 whould be ok ?
the site is usually not so busy, but it has sometimes spikes of static
traffic (about 50Mbps) that's why i upped the thread limit, 3000 was
to low
is it safe to change thread_pools on runtime ?
>
>> running with a single backend
>> .connect_timeout = 1s; added to the backend definition
>
> Any particular reason for adding that?
originally it wasn't there i added it trying to go around the issue
>
>> I added
>>
>> sub vcl_error {
>> if (req.restarts < 10) {
>> restart;
>> }
>> }
>>
>> (is it possible to add a pause before doing restart ?)
>
> No. This is also a dirty workaround for a fundamental problem.
>
>> In about 0.1% of request we get
>>
>> 10 TxRequest b POST
>> 10 TxURL b /php
>> 10 TxProtocol b HTTP/1.1
>> 10 TxHeader b x-requested-with: XMLHttpRequest
>> 10 TxHeader b Accept-Language: pl
>> 10 TxHeader b Referer: http://www.xxxxx/php
>> 10 TxHeader b Accept: text/html, */*
>> 10 TxHeader b Content-Type: application/x-www-form-urlencoded
>> 10 TxHeader b UA-CPU: x86
>> 10 TxHeader b Accept-Encoding: gzip, deflate
>> 10 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;
>> Windows NT 5.1)
>> 10 TxHeader b Content-Length: 8
>> 10 TxHeader b Cookie: _.1
>> 10 TxHeader b X-NovINet: v1.2
>> 10 TxHeader b X-Varnish: 603437812
>> 10 TxHeader b X-Forwarded-For: 79.162.xxx
>> 10 BackendClose b default
>> 31 VCL_call c error
>> 31 VCL_return c deliver
>> 31 Length c 465
>> 31 VCL_call c deliver
>> 31 VCL_return c deliver
>> 31 TxProtocol c HTTP/1.1
>> 31 TxStatus c 503
>>
>> machine is not overloaded, there are 150 apache running 80% of them is idle
>>
>> what does
>> 31 VCL_call c error mean , a connection error, apache returned
>> invalid response ?
>
> No, it just means that vcl_error is called. BackendClose notes that the
> connection to the backend was closed.
>
>> can I get some more information about this error using some syslog in
>> vcl_error or mayby in some other way ?
>
> Possibly, but using syslog in vcl is the last thing I'd recommend.
>
> Does your syslog say anything meaningful? Like assert-errors...
no, only info about admin commands
> (...)
>> 60064 Backend connections failures
>> this is old and it's not changing now
>
> Did the error-rate go down once you solved this? What was causing these
> problems?
it was related to load testing, in production it went away when i
upped maxclients on apache
>
>> 20 N worker threads
>> 4152 N worker threads created
>> 0 N worker threads not created
>> 0 N worker threads limited
>> 0 N queued work requests
>> 226847 N overflowed work requests
>
> This is what I mean with -w 10,6000 being wrong. After the initial startup,
> overflowed work requests shouldn't grow much, and you're currently running
> at only 20 threads (the minimum), which will cause overflows very fast
> (consider how many connections a single client will use to fetch a front
> page... You can easily imagine overflowing with just 3-4 concurrent
> clients.)
>
> But that's not really causing any 503s. Just delays while threads are
> created (and removed).
tcpdump of another 503 (apache is running on port 88),
11:09:50.187842 IP x.x.x.x.50780 > x.x.x.x.88: S 88526893:88526893(0)
win 32792 <mss 16396,sackOK,timestamp 532825309 0,nop,wscale 7>
11:09:50.187851 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0)
ack 88526894 win 32768 <mss 16396,sackOK,timestamp 532825309
532825309,nop,wscale 7>
11:09:50.187867 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257
<nop,nop,timestamp 532825309 532825309>
11:09:53.187730 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0)
ack 88526894 win 32768 <mss 16396,sackOK,timestamp 532826059
532825309,nop,wscale 7>
11:09:53.187740 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257
<nop,nop,timestamp 532826059 532826059,nop,nop,sack 1 {0:1}>
11:09:59.191730 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0)
ack 88526894 win 32768 <mss 16396,sackOK,timestamp 532827559
532826059,nop,wscale 7>
11:09:59.191744 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257
<nop,nop,timestamp 532827559 532827559,nop,nop,sack 1 {0:1}>
11:10:05.187748 IP x.x.x.x.50780 > x.x.x.x.88: P 1:918(917) ack 1 win
257 <nop,nop,timestamp 532829059 532827559>
11:10:05.187766 IP x.x.x.x.88 > x.x.x.x.50780: . ack 918 win 271
<nop,nop,timestamp 532829059 532829059>
11:10:05.187799 IP x.x.x.x.50780 > x.x.x.x.88: F 918:918(0) ack 1 win
257 <nop,nop,timestamp 532829059 532829059>
11:10:05.190887 IP x.x.x.x.88 > x.x.x.x.50780: P 1:2968(2967) ack 919
win 271 <nop,nop,timestamp 532829059 532829059>
11:10:05.190909 IP x.x.x.x.50780 > x.x.x.x.88: R 88527812:88527812(0) win 0
x.x.x.x is a local address bound to eth0
Thank You for your help.
--
Michal Grzedzicki
More information about the varnish-misc
mailing list