Varnish 2.1 + kswapd0 freak out

Augusto Becciu augusto at jadedpixel.com
Fri May 21 00:45:25 CEST 2010


Hey guys,

I'm running Varnish 2.1 in two m2.xlarge ec2 instances (17G of RAM +
linux kernel 2.6.21.7-2.fc8xen-ec2-v1.0). Those two servers have been
running for 2 months now almost without trouble. But I've noticed some
crazy spikes in cpu usage (mostly in kernel land) once in a while.
A few days ago I saw kswapd0 consuming 100% of a cpu core and varnishd
consuming 100% of the other cpu core. I could strace varnish for a few
seconds and everything looked normal, but then it crashed and left a
zombie process eating most of the cpu so I had to restart the server.
Today, exactly the same thing happened in the other server, and this
is starting to scare me out.

We're running varnish with the following params:

varnishd -P /var/run/varnishd.pid -a 0.0.0.0:2000 -T 127.0.0.1:6082 -w
200,2000 -s malloc,12G -p lru_interval=20 -f /etc/varnish/varnish.vcl

We don't have swap enabled on these servers.

Here's varnishstat -1 when varnish was freaking out:

client_conn           9592723         7.82 Client connections accepted
client_drop                 0         0.00 Connection dropped, no sess/wrk
client_req           67302765        54.84 Client requests received
cache_hit            50571130        41.20 Cache hits
cache_hitpass               0         0.00 Cache hits for pass
cache_miss           16050808        13.08 Cache misses
backend_conn         16029200        13.06 Backend conn. success
backend_unhealthy            0         0.00 Backend conn. not attempted
backend_busy                0         0.00 Backend conn. too many
backend_fail            20649         0.02 Backend conn. failures
backend_reuse           12352         0.01 Backend conn. reuses
backend_toolate             0         0.00 Backend conn. was closed
backend_recycle         12352         0.01 Backend conn. recycles
backend_unused              0         0.00 Backend conn. unused
fetch_head                  0         0.00 Fetch head
fetch_length         12764170        10.40 Fetch with Length
fetch_chunked         3272791         2.67 Fetch chunked
fetch_eof                   0         0.00 Fetch EOF
fetch_bad                   0         0.00 Fetch had bad headers
fetch_close                49         0.00 Fetch wanted close
fetch_oldhttp               0         0.00 Fetch pre HTTP/1.1 closed
fetch_zero                  0         0.00 Fetch zero len
fetch_failed          3272895         2.67 Fetch failed
n_sess_mem                587          .   N struct sess_mem
n_sess                    465          .   N struct sess
n_object               659083          .   N struct object
n_vampireobject             0          .   N unresurrected objects
n_objectcore           659439          .   N struct objectcore
n_objecthead           907405          .   N struct objecthead
n_smf                       0          .   N struct smf
n_smf_frag                  0          .   N small free smf
n_smf_large                 0          .   N large free smf
n_vbe_conn                277          .   N struct vbe_conn
n_wrk                     400          .   N worker threads
n_wrk_create              458         0.00 N worker threads created
n_wrk_failed                0         0.00 N worker threads not created
n_wrk_max                   0         0.00 N worker threads limited
n_wrk_queue                 0         0.00 N queued work requests
n_wrk_overflow            874         0.00 N overflowed work requests
n_wrk_drop                  0         0.00 N dropped work requests
n_backend                   2          .   N backends
n_expired              112662          .   N expired objects
n_lru_nuked          11954429          .   N LRU nuked objects
n_lru_saved                 0          .   N LRU saved objects
n_lru_moved          46618517          .   N LRU moved objects
n_deathrow                  0          .   N objects on deathrow
losthdr                     2         0.00 HTTP header overflows
n_objsendfile               0         0.00 Objects sent with sendfile
n_objwrite           60192420        49.04 Objects sent with write
n_objoverflow               0         0.00 Objects overflowing workspace
s_sess                9592577         7.82 Total Sessions
s_req                67302765        54.84 Total Requests
s_pipe                    110         0.00 Total pipe
s_pass                   1691         0.00 Total pass
s_fetch              12764115        10.40 Total fetch
s_hdrbytes        21558035591     17564.97 Total header bytes
s_bodybytes      1162454990977    947140.58 Total body bytes
sess_closed           7687689         6.26 Session Closed
sess_pipeline               0         0.00 Session Pipeline
sess_readahead              0         0.00 Session Read Ahead
sess_linger          61236267        49.89 Session Linger
sess_herd            16659649        13.57 Session herd
shm_records        3395953253      2766.94 SHM records
shm_writes          131371160       107.04 SHM writes
shm_flushes               661         0.00 SHM flushes due to overflow
shm_cont               114836         0.09 SHM MTX contention
shm_cycles               1378         0.00 SHM cycles through buffer
sm_nreq                     0         0.00 allocator requests
sm_nobj                     0          .   outstanding allocations
sm_balloc                   0          .   bytes allocated
sm_bfree                    0          .   bytes free
sma_nreq             37442974        30.51 SMA allocator requests
sma_nobj              1318091          .   SMA outstanding allocations
sma_nbytes        12884892751          .   SMA outstanding bytes
sma_balloc       250925494011          .   SMA bytes allocated
sma_bfree        238040601260          .   SMA bytes free
sms_nreq              3967048         3.23 SMS allocator requests
sms_nobj                    0          .   SMS outstanding allocations
sms_nbytes       18446744073709527064          .   SMS outstanding bytes
sms_balloc         1895595320          .   SMS bytes allocated
sms_bfree          1895619352          .   SMS bytes freed
backend_req          16043889        13.07 Backend requests made
n_vcl                       1         0.00 N vcl total
n_vcl_avail                 1         0.00 N vcl available
n_vcl_discard               0         0.00 N vcl discarded
n_purge                 26155          .   N total active purges
n_purge_add            678663         0.55 N new purges added
n_purge_retire         652508         0.53 N old purges deleted
n_purge_obj_test     47484518        38.69 N objects tested
n_purge_re_test   41413683761     33742.88 N regexps tested against
n_purge_dups           485656         0.40 N duplicate purges removed
hcb_nolock           50605455        41.23 HCB Lookups without lock
hcb_lock                  566         0.00 HCB Lookups with lock
hcb_insert           16016509        13.05 HCB Inserts
esi_parse                   0         0.00 Objects ESI parsed (unlock)
esi_errors                  0         0.00 ESI parse errors (unlock)
accept_fail                 0         0.00 Accept failures
client_drop_late            0         0.00 Connection dropped late
uptime                1227331         1.00 Client uptime


Have anyone experienced something similar?

Thanks,
Augusto




More information about the varnish-misc mailing list