thread pool issues
Dennis Hendriksen
dennis.hendriksen at kalooga.com
Tue Jun 14 17:20:36 CEST 2011
Hi Kristian,
Thank you for your suggestions. We've upgraded our Varnish config to
2.1.5 which decreases the thread_pool_add_delay from 20ms to 2ms. I've
included a varnishstat listing below. The numbers reflect live testing,
(our experiences with synthetic tests is that it is very hard to imitate
real life behavior)
> I would typically recommend something closer to minimum 500, pools 2 and
> max 5000.
Currently we use 8 pools because the server has 2x4 CPU cores. Is there
an advantage to use less pools than the number of CPU cores? When we increase
the number of threads the problem with "N worker threads limited" is solved! :-)
> How many connections (not requests) are you doing during these tests?
ls -1 /proc/<varnish pid>/fd | wc -l gives us ~1300 (single load) and
~2600 (double load) file descriptors (=connections?).
> Do you use keep-alive and long-lasting connections? You may want to see
> if reducing session_linger helps.
Requests mostly arrive from web browsers.
netstat -tna | wc -l
~12000 tcp connections (single load)
Unfortunately Varnish, after facing double load, now gets very
'unresponsive' after a while. Client requests are not answered by
varnish resulting in long waiting times (10+ seconds) or timeouts. We
do not have bandwidth issues.
Is it possible that in our use case we've reached the limit of what
Varnish can handle?
Greetings and thanks for the help so far!
Dennis
varnishstat -1
client_conn 696307 177.40 Client connections accepted
client_drop 0 0.00 Connection dropped, no
sess/wrk
client_req 965174 245.90 Client requests received
cache_hit 925943 235.91 Cache hits
cache_hitpass 5 0.00 Cache hits for pass
cache_miss 39125 9.97 Cache misses
backend_conn 4568 1.16 Backend conn. success
backend_unhealthy 0 0.00 Backend conn. not attempted
backend_busy 0 0.00 Backend conn. too many
backend_fail 3 0.00 Backend conn. failures
backend_reuse 34683 8.84 Backend conn. reuses
backend_toolate 79 0.02 Backend conn. was closed
backend_recycle 34768 8.86 Backend conn. recycles
backend_unused 0 0.00 Backend conn. unused
fetch_head 0 0.00 Fetch head
fetch_length 24818 6.32 Fetch with Length
fetch_chunked 14426 3.68 Fetch chunked
fetch_eof 0 0.00 Fetch EOF
fetch_bad 0 0.00 Fetch had bad headers
fetch_close 1 0.00 Fetch wanted close
fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed
fetch_zero 0 0.00 Fetch zero len
fetch_failed 0 0.00 Fetch failed
n_sess_mem 2235 . N struct sess_mem
n_sess 1787 . N struct sess
n_object 34379 . N struct object
n_vampireobject 0 . N unresurrected objects
n_objectcore 34516 . N struct objectcore
n_objecthead 22424 . N struct objecthead
n_smf 0 . N struct smf
n_smf_frag 0 . N small free smf
n_smf_large 0 . N large free smf
n_vbe_conn 6 . N struct vbe_conn
n_wrk 280 . N worker threads
n_wrk_create 280 0.07 N worker threads created
n_wrk_failed 0 0.00 N worker threads not created
n_wrk_max 9693 2.47 N worker threads limited
n_wrk_queue 0 0.00 N queued work requests
n_wrk_overflow 0 0.00 N overflowed work requests
n_wrk_drop 0 0.00 N dropped work requests
n_backend 4 . N backends
n_expired 385 . N expired objects
n_lru_nuked 0 . N LRU nuked objects
n_lru_saved 0 . N LRU saved objects
n_lru_moved 370058 . N LRU moved objects
n_deathrow 0 . N objects on deathrow
losthdr 0 0.00 HTTP header overflows
n_objsendfile 0 0.00 Objects sent with sendfile
n_objwrite 815230 207.70 Objects sent with write
n_objoverflow 0 0.00 Objects overflowing workspace
s_sess 696245 177.39 Total Sessions
s_req 965174 245.90 Total Requests
s_pipe 4 0.00 Total pipe
s_pass 120 0.03 Total pass
s_fetch 39245 10.00 Total fetch
s_hdrbytes 285675067 72783.46 Total header bytes
s_bodybytes 10667879292 2717931.03 Total body bytes
sess_closed 30597 7.80 Session Closed
sess_pipeline 1238 0.32 Session Pipeline
sess_readahead 537 0.14 Session Read Ahead
sess_linger 955973 243.56 Session Linger
sess_herd 891554 227.15 Session herd
shm_records 39223429 9993.23 SHM records
shm_writes 4022999 1024.97 SHM writes
shm_flushes 0 0.00 SHM flushes due to overflow
shm_cont 1578 0.40 SHM MTX contention
shm_cycles 15 0.00 SHM cycles through buffer
sm_nreq 0 0.00 allocator requests
sm_nobj 0 . outstanding allocations
sm_balloc 0 . bytes allocated
sm_bfree 0 . bytes free
sma_nreq 71633 18.25 SMA allocator requests
sma_nobj 66455 . SMA outstanding allocations
sma_nbytes 608883602 . SMA outstanding bytes
sma_balloc 2206748168 . SMA bytes allocated
sma_bfree 1597864566 . SMA bytes free
sms_nreq 0 0.00 SMS allocator requests
sms_nobj 0 . SMS outstanding allocations
sms_nbytes 0 . SMS outstanding bytes
sms_balloc 0 . SMS bytes allocated
sms_bfree 0 . SMS bytes freed
backend_req 39247 10.00 Backend requests made
n_vcl 2 0.00 N vcl total
n_vcl_avail 1 0.00 N vcl available
n_vcl_discard 1 0.00 N vcl discarded
n_purge 1 . N total active purges
n_purge_add 1 0.00 N new purges added
n_purge_retire 0 0.00 N old purges deleted
n_purge_obj_test 0 0.00 N objects tested
n_purge_re_test 0 0.00 N regexps tested against
n_purge_dups 0 0.00 N duplicate purges removed
hcb_nolock 0 0.00 HCB Lookups without lock
hcb_lock 0 0.00 HCB Lookups with lock
hcb_insert 0 0.00 HCB Inserts
esi_parse 0 0.00 Objects ESI parsed (unlock)
esi_errors 0 0.00 ESI parse errors (unlock)
accept_fail 0 0.00 Accept failures
client_drop_late 0 0.00 Connection dropped late
uptime 3925 1.00 Client uptime
backend_retry 2 0.00 Backend conn. retry
dir_dns_lookups 0 0.00 DNS director lookups
dir_dns_failed 0 0.00 DNS director failed lookups
dir_dns_hit 0 0.00 DNS director cached lookups
hit
dir_dns_cache_full 0 0.00 DNS director full dnscache
fetch_1xx 0 0.00 Fetch no body (1xx)
fetch_204 0 0.00 Fetch no body (204)
fetch_304 0 0.00 Fetch no body (304)
On Fri, 2011-06-10 at 16:29 +0200, Kristian Lyngstol wrote:
> Greetings,
>
> On Fri, Jun 10, 2011 at 08:32:11AM +0200, Dennis Hendriksen wrote:
> > We're running Varnish 2.0.6 on a dual quad core server which is doing
> > about 500 req/s with a 97% hit ratio serving mostly images with. When we
> > increase the load to about 800 req/s than we encounter two problems that
> > seem to be related with the thread pool increase.
>
> You really should see if you can't move to at least Varnish 2.1.5.
>
> > When we double the varnish load then the "N worker threads limited"
> > increases rapidly (100k+) while the "N worker threads created" does not
> > increase (8 pools, min pool size 25, max pool size 1000). Varnish is
> > unresponsive and client connections hang.
>
> That'll give you 200 threads at startup.
>
> I would typically recommend something closer to minimum 500, pools 2 and
> max 5000.
>
> You also want to reduce the thread_pool_add_delay from the (2.0.6)
> default 20ms to 2ms for instance. That will limit the rate that threads
> are started at, and 20ms is often way too slow.
>
> How many connections (not requests) are you doing during these tests?
>
> > At other times we see the number of worker threads increasing but again
> > connections 'hang' while Varnish doesn't show any dropped connections
> > (only overflows).
>
> Do you use keep-alive and long-lasting connections? You may want to see
> if reducing session_linger helps.
>
> Are you testing with real traffic or synthetic tests?
>
> If possible, varnishstat -1 output would be useful.
>
> - Kristian
>
More information about the varnish-misc
mailing list