Release schedule for saint mode.

Sat Jan 30 22:22:32 CET 2010

I started a test yesterday with trunk at 4478, and after running for about 12 hours under pretty good load, the machine started swapping heavily and we had to restart Varnish.  After coming back up, it didn't have the 2.8M objects it did when it was shut down:

uptime                    464          .   Child uptime
client_conn              1951         4.20 Client connections accepted
client_drop                 0         0.00 Connection dropped, no sess
client_req              79366       171.05 Client requests received
cache_hit               19669        42.39 Cache hits
cache_hitpass               0         0.00 Cache hits for pass
cache_miss              59265       127.73 Cache misses
backend_conn              288         0.62 Backend conn. success
backend_unhealthy            0         0.00 Backend conn. not attempted
backend_busy                0         0.00 Backend conn. too many
backend_fail                0         0.00 Backend conn. failures
backend_reuse           59118       127.41 Backend conn. reuses
backend_toolate             0         0.00 Backend conn. was closed
backend_recycle         59128       127.43 Backend conn. recycles
backend_unused             76         0.16 Backend conn. unused
fetch_head                  0         0.00 Fetch head
fetch_length            59076       127.32 Fetch with Length
fetch_chunked               0         0.00 Fetch chunked
fetch_eof                   0         0.00 Fetch EOF
fetch_bad                   0         0.00 Fetch had bad headers
fetch_close                 0         0.00 Fetch wanted close
fetch_oldhttp               0         0.00 Fetch pre HTTP/1.1 closed
fetch_zero                  0         0.00 Fetch zero len
fetch_failed                0         0.00 Fetch failed
n_sess_mem                249          .   N struct sess_mem
n_sess                    186          .   N struct sess
n_object                58989          .   N struct object
n_vampireobject             0          .   N unresurrected objects
n_objectcore            59018          .   N struct objectcore
n_objecthead            59018          .   N struct objecthead
n_smf                       0          .   N struct smf
n_smf_frag                  0          .   N small free smf
n_smf_large                 0          .   N large free smf
n_vbe_conn                119          .   N struct vbe_conn
n_wrk                     256          .   N worker threads
n_wrk_create              256         0.55 N worker threads created
n_wrk_failed                0         0.00 N worker threads not created
n_wrk_max                   0         0.00 N worker threads limited
n_wrk_queue                 0         0.00 N queued work requests
n_wrk_overflow              0         0.00 N overflowed work requests
n_wrk_drop                  0         0.00 N dropped work requests
n_backend                   1          .   N backends
n_expired                   0          .   N expired objects
n_lru_nuked                 0          .   N LRU nuked objects
n_lru_saved                 0          .   N LRU saved objects
n_lru_moved                 0          .   N LRU moved objects
n_deathrow                  0          .   N objects on deathrow
losthdr                     0         0.00 HTTP header overflows
n_objsendfile               0         0.00 Objects sent with sendfile
n_objwrite              78656       169.52 Objects sent with write
n_objoverflow               0         0.00 Objects overflowing workspace
s_sess                   1945         4.19 Total Sessions
s_req                   79366       171.05 Total Requests
s_pipe                      0         0.00 Total pipe
s_pass                     56         0.12 Total pass
s_fetch                 59074       127.31 Total fetch
s_hdrbytes           31974718     68911.03 Total header bytes
s_bodybytes        2381590875   5132738.95 Total body bytes
sess_closed               965         2.08 Session Closed
sess_pipeline               0         0.00 Session Pipeline
sess_readahead              0         0.00 Session Read Ahead
sess_linger             78420       169.01 Session Linger
sess_herd               55547       119.71 Session herd
shm_records           5479085     11808.37 SHM records
shm_writes             201347       433.94 SHM writes
shm_flushes                 0         0.00 SHM flushes due to overflow
shm_cont                  453         0.98 SHM MTX contention
shm_cycles                  2         0.00 SHM cycles through buffer
sm_nreq                     0         0.00 allocator requests
sm_nobj                     0          .   outstanding allocations
sm_balloc                   0          .   bytes allocated
sm_bfree                    0          .   bytes free
sma_nreq                    0         0.00 SMA allocator requests
sma_nobj                    0          .   SMA outstanding allocations
sma_nbytes                  0          .   SMA outstanding bytes
sma_balloc                  0          .   SMA bytes allocated
sma_bfree                   0          .   SMA bytes free
sms_nreq                  622         1.34 SMS allocator requests
sms_nobj                    0          .   SMS outstanding allocations
sms_nbytes                  0          .   SMS outstanding bytes
sms_balloc             327230          .   SMS bytes allocated
sms_bfree              327230          .   SMS bytes freed
backend_req             59406       128.03 Backend requests made
n_vcl                       1         0.00 N vcl total
n_vcl_avail                 1         0.00 N vcl available
n_vcl_discard               0         0.00 N vcl discarded
n_purge                     1          .   N total active purges
n_purge_add                 1         0.00 N new purges added
n_purge_retire              0         0.00 N old purges deleted
n_purge_obj_test            0         0.00 N objects tested
n_purge_re_test             0         0.00 N regexps tested against
n_purge_dups                0         0.00 N duplicate purges removed
hcb_nolock                  0         0.00 HCB Lookups without lock
hcb_lock                    0         0.00 HCB Lookups with lock
hcb_insert                  0         0.00 HCB Inserts
esi_parse                   0         0.00 Objects ESI parsed (unlock)
esi_errors                  0         0.00 ESI parse errors (unlock)

Nothing abnormal in the logs, just normal stopping/starting messages.

Jan 30 05:30:48 v0 varnishd[16001]: Manager got SIGINT
Jan 30 05:30:48 v0 varnishd[16001]: Stopping Child
Jan 30 12:52:28 v0 varnishd[27107]: child (27108) Started
Jan 30 12:52:30 v0 varnishd[27107]: Child (27108) said Closed fds: 15 16 17 20 21 23 24
Jan 30 12:52:30 v0 varnishd[27107]: Child (27108) said Child starts
Jan 30 12:52:30 v0 varnishd[27107]: Child (27108) said Silo completely loaded
Jan 30 12:52:30 v0 varnishd[27107]: Child (27108) said Probe("GET /r/1/0/dl/gAO0cAU8rz-UR3ZNecmCwC1R0_hIR_QR HTTP/1.1
Jan 30 12:52:30 v0 varnishd[27107]: Child (27108) said 
Jan 30 12:52:30 v0 varnishd[27107]: Child (27108) said Host: deco.slide.com
Jan 30 12:52:30 v0 varnishd[27107]: Child (27108) said 
Jan 30 12:52:30 v0 varnishd[27107]: Child (27108) said Connection: close
Jan 30 12:52:30 v0 varnishd[27107]: Child (27108) said 
Jan 30 12:52:30 v0 varnishd[27107]: Child (27108) said ", 0.8, 3)

Cmdline:

varnishd \
	-a 0.0.0.0:3228 \
	-f config.vcl \
	-g nogroup \
	-h classic,524287 \
	-P varnish.pid \
	-p ban_lurker_sleep=60s \
	-p between_bytes_timeout=15s \
	-p cache_vbe_conns=on \
	-p cli_buffer=32768 \
	-p client_http11=on \
	-p default_grace=0s \
	-p default_ttl=0s \
	-p err_ttl=0s \
	-p first_byte_timeout=15s \
	-p listen_depth=8192 \
	-p lru_interval=3600 \
	-p max_restarts=2 \
	-p obj_workspace=2048 \
	-p pipe_timeout=15s \
	-p purge_dups=on \
	-p rush_exponent=32 \
	-p sess_timeout=15s \
	-p thread_pools=8 \
	-p thread_pool_min=32 \
	-p thread_pool_max=8192 \
	-p thread_pool_stack=262144 \
	-s persistent,/cache/sdm/varnish,120G \
	-s persistent,/cache/sdn/varnish,120G \
	-s persistent,/cache/sdo/varnish,120G \
	-s persistent,/cache/sdp/varnish,120G \
	-s persistent,/cache/sdq/varnish,120G \
	-s persistent,/cache/sdr/varnish,120G \
	-s persistent,/cache/sds/varnish,120G \
	-s persistent,/cache/sdt/varnish,120G \
	-s persistent,/cache/sdu/varnish,120G \
	-s persistent,/cache/sdv/varnish,120G \
	-s persistent,/cache/sdw/varnish,120G \
	-s persistent,/cache/sdx/varnish,120G \
	-T 0.0.0.0:6666 \
	-u nobody

OT: why does Varnish swap when the working set no longer fits in RAM -- and when dirty pages are in the 10-100MB range (64GB box).  I have a feeling that turning off swap would cause the kernel to eventually lock up, so I'll have to see if there's a way to tune this (swappiness already at 1).

Thanks,
-- 
Ken

On Jan 29, 2010, at 11:16 AM, Ken Brownfield wrote:

> On Jan 29, 2010, at 3:54 AM, Tollef Fog Heen wrote:
>> It should be.  You'll lose the last storage silo (since that's not
>> closed yet), but older objects should be available.
> 
> This might be the source of the confusion.  How often are silos closed?  My testing was simply "hit the cache for a single object a few times to increase obj.hits, bounce varnish, next request was a miss".  If I can get a handle on how often silos are closed, I can more accurately check the behavior under production load.
> 
>> Can you see if you can reproduce and post a varnishlog -1 when the child
>> has restarted as well as after having restarted the parent?
> 
> Will do, thanks.
> -- 
> Ken
> 
> 
>> | Maybe there's a graceful, undocumented method of stopping the parent that I'm not aware of?
>> 
>> Shouldn't matter.
>> 
>> -- 
>> Tollef Fog Heen 
>> Redpill Linpro -- Changing the game!
>> t: +47 21 54 41 73
>