Varnish unresponsive on our server

Uday Kumar uday.polu at indiamart.com
Thu Jun 20 20:44:22 UTC 2024


Hi Guillaume,

pz_list and recv_queue are not (to my knowledge) Varnish counters, where
are you seeing them?
Yes, could not find them in varnish counters

for pz_list: we checked count of processes list during issue period [9th
June]
Ex: sar -q -f /var/log/sa/sa09 -s 21:15:00 -e 21:30:00

for recv_queue, we get this metric using netstat command [But here, we
cannot get old logs now]
Ex: sudo netstat -nltp

Could you share a "varnishstat -1" of the impacted machine?
I doubt, if we can get varnishstat of machine during the issue period -
correct me if I'm wrong

Below are the stats taken now --> 21st June -  02:00
MGT.uptime             9960545         1.00 Management process uptime
MGT.child_start              1         0.00 Child process started
MGT.child_exit               0         0.00 Child process normal exit
MGT.child_stop               0         0.00 Child process unexpected exit
MGT.child_died               0         0.00 Child process died (signal)
MGT.child_dump               0         0.00 Child process core dumped
MGT.child_panic              0         0.00 Child process panic
MAIN.summs           999145669       100.31 stat summ operations
MAIN.uptime            9960546         1.00 Child process uptime
MAIN.sess_conn       230308639        23.12 Sessions accepted
MAIN.sess_drop               0         0.00 Sessions dropped
MAIN.sess_fail               0         0.00 Session accept failures
MAIN.client_req_400         8578         0.00 Client requests received,
subject to 400 errors
MAIN.client_req_417            0         0.00 Client requests received,
subject to 417 errors
MAIN.client_req        272378619        27.35 Good client requests received
MAIN.cache_hit          61997297         6.22 Cache hits
MAIN.cache_hitpass             0         0.00 Cache hits for pass.
MAIN.cache_hitmiss        147993         0.01 Cache hits for miss.
MAIN.cache_miss        210380895        21.12 Cache misses
MAIN.backend_conn        5153783         0.52 Backend conn. success
MAIN.backend_unhealthy            0         0.00 Backend conn. not attempted
MAIN.backend_busy                 0         0.00 Backend conn. too many
MAIN.backend_fail                42         0.00 Backend conn. failures
MAIN.backend_reuse        206881432        20.77 Backend conn. reuses
MAIN.backend_recycle      208365094        20.92 Backend conn. recycles
MAIN.backend_retry             3347         0.00 Backend conn. retry
MAIN.fetch_head                   0         0.00 Fetch no body (HEAD)
MAIN.fetch_length           2073051         0.21 Fetch with Length
MAIN.fetch_chunked        209964885        21.08 Fetch chunked
MAIN.fetch_eof                    9         0.00 Fetch EOF
MAIN.fetch_bad                    0         0.00 Fetch bad T-E
MAIN.fetch_none                  40         0.00 Fetch no body
MAIN.fetch_1xx                    0         0.00 Fetch no body (1xx)
MAIN.fetch_204                    0         0.00 Fetch no body (204)
MAIN.fetch_304                    0         0.00 Fetch no body (304)
MAIN.fetch_failed            415941         0.04 Fetch failed (all causes)
MAIN.fetch_no_thread              0         0.00 Fetch failed (no thread)
MAIN.pools                        2          .   Number of thread pools
MAIN.threads                     31          .   Total number of threads
MAIN.threads_limited             75         0.00 Threads hit max
MAIN.threads_created         290240         0.03 Threads created
MAIN.threads_destroyed       290209         0.03 Threads destroyed
MAIN.threads_failed               0         0.00 Thread creation failed
MAIN.thread_queue_len             0          .   Length of session queue
MAIN.busy_sleep             1000325         0.10 Number of requests sent to
sleep on busy objhdr
MAIN.busy_wakeup            1000325         0.10 Number of requests woken
after sleep on busy objhdr
MAIN.busy_killed                  0         0.00 Number of requests killed
after sleep on busy objhdr
MAIN.sess_queued             324645         0.03 Sessions queued for thread
MAIN.sess_dropped                 0         0.00 Sessions dropped for thread
MAIN.req_dropped                  0         0.00 Requests dropped
MAIN.n_object               1363649          .   object structs made
MAIN.n_vampireobject              0          .   unresurrected objects
MAIN.n_objectcore           1363658          .   objectcore structs made
MAIN.n_objecthead           1365235          .   objecthead structs made
MAIN.n_backend                   18          .   Number of backends
MAIN.n_expired             18708716          .   Number of expired objects
MAIN.n_lru_nuked          190240122          .   Number of LRU nuked objects
MAIN.n_lru_moved           48881176          .   Number of LRU moved objects
MAIN.losthdr                    194         0.00 HTTP header overflows
MAIN.s_sess               230308639        23.12 Total sessions seen
MAIN.s_pipe                       0         0.00 Total pipe sessions seen
MAIN.s_pass                     221         0.00 Total pass-ed requests seen
MAIN.s_fetch              210381116        21.12 Total backend fetches
initiated
MAIN.s_synth                  15291         0.00 Total synthethic responses
made
MAIN.s_req_hdrbytes    152588868591     15319.33 Request header bytes
MAIN.s_req_bodybytes    47945048076      4813.50 Request body bytes
MAIN.s_resp_hdrbytes    61429100826      6167.24 Response header bytes
MAIN.s_resp_bodybytes  31545695399316   3167064.88 Response body bytes
MAIN.s_pipe_hdrbytes              0         0.00 Pipe request header bytes
MAIN.s_pipe_in                    0         0.00 Piped bytes from client
MAIN.s_pipe_out                   0         0.00 Piped bytes to client
MAIN.sess_closed          118139254        11.86 Session Closed
MAIN.sess_closed_err       11257806         1.13 Session Closed with error
MAIN.sess_readahead               0         0.00 Session Read Ahead
MAIN.sess_herd             65082138         6.53 Session herd
MAIN.sc_rem_close         100948777        10.13 Session OK  REM_CLOSE
MAIN.sc_req_close         116459448        11.69 Session OK  REQ_CLOSE
MAIN.sc_req_http10             1044         0.00 Session Err REQ_HTTP10
MAIN.sc_rx_bad                    0         0.00 Session Err RX_BAD
MAIN.sc_rx_body                   0         0.00 Session Err RX_BODY
MAIN.sc_rx_junk                8578         0.00 Session Err RX_JUNK
MAIN.sc_rx_overflow               0         0.00 Session Err RX_OVERFLOW
MAIN.sc_rx_timeout         11248199         1.13 Session Err RX_TIMEOUT
MAIN.sc_tx_pipe                   0         0.00 Session OK  TX_PIPE
MAIN.sc_tx_error                  0         0.00 Session Err TX_ERROR
MAIN.sc_tx_eof              1617225         0.16 Session OK  TX_EOF
MAIN.sc_resp_close                0         0.00 Session OK  RESP_CLOSE
MAIN.sc_overload                  0         0.00 Session Err OVERLOAD
MAIN.sc_pipe_overflow             0         0.00 Session Err PIPE_OVERFLOW
MAIN.sc_range_short               0         0.00 Session Err RANGE_SHORT
MAIN.sc_req_http20                0         0.00 Session Err REQ_HTTP20
MAIN.sc_vcl_failure               4         0.00 Session Err VCL_FAILURE
MAIN.shm_records        34703350941      3484.08 SHM records
MAIN.shm_writes          2492186358       250.21 SHM writes
MAIN.shm_flushes          413619708        41.53 SHM flushes due to overflow
MAIN.shm_cont              16129035         1.62 SHM MTX contention
MAIN.shm_cycles               31683         0.00 SHM cycles through buffer
MAIN.backend_req          212039555        21.29 Backend requests made
MAIN.n_vcl                        9          .   Number of loaded VCLs in
total
MAIN.n_vcl_avail                  9          .   Number of VCLs available
MAIN.n_vcl_discard                0          .   Number of discarded VCLs
MAIN.vcl_fail                     4         0.00 VCL failures
MAIN.bans                         1          .   Count of bans
MAIN.bans_completed               1          .   Number of bans marked
'completed'
MAIN.bans_obj                     0          .   Number of bans using obj.*
MAIN.bans_req                     0          .   Number of bans using req.*
MAIN.bans_added                   1         0.00 Bans added
MAIN.bans_deleted                 0         0.00 Bans deleted
MAIN.bans_tested                  0         0.00 Bans tested against
objects (lookup)
MAIN.bans_obj_killed              0         0.00 Objects killed by bans
(lookup)
MAIN.bans_lurker_tested            0         0.00 Bans tested against
objects (lurker)
MAIN.bans_tests_tested             0         0.00 Ban tests tested against
objects (lookup)
MAIN.bans_lurker_tests_tested            0         0.00 Ban tests tested
against objects (lurker)
MAIN.bans_lurker_obj_killed              0         0.00 Objects killed by
bans (lurker)
MAIN.bans_lurker_obj_killed_cutoff            0         0.00 Objects killed
by bans for cutoff (lurker)
MAIN.bans_dups                                0         0.00 Bans
superseded by other bans
MAIN.bans_lurker_contention                   0         0.00 Lurker gave
way for lookup
MAIN.bans_persisted_bytes                    16          .   Bytes used by
the persisted ban lists
MAIN.bans_persisted_fragmentation             0          .   Extra bytes in
persisted ban lists due to fragmentation
MAIN.n_purges                                 0          .   Number of
purge operations executed
MAIN.n_obj_purged                             0          .   Number of
purged objects
MAIN.exp_mailed                       400657705        40.22 Number of
objects mailed to expiry thread
MAIN.exp_received                     400655888        40.22 Number of
objects received by expiry thread
MAIN.hcb_nolock                       272378417        27.35 HCB Lookups
without lock
MAIN.hcb_lock                         210148272        21.10 HCB Lookups
with lock
MAIN.hcb_insert                       210147129        21.10 HCB Inserts
MAIN.esi_errors                               0         0.00 ESI parse
errors (unlock)
MAIN.esi_warnings                             0         0.00 ESI parse
warnings (unlock)
MAIN.vmods                                    2          .   Loaded VMODs
MAIN.n_gzip                                   0         0.00 Gzip operations
MAIN.n_gunzip                                 0         0.00 Gunzip
operations
MAIN.n_test_gunzip                    125951749        12.65 Test gunzip
operations
LCK.backend.creat                            20         0.00 Created locks
LCK.backend.destroy                           0         0.00 Destroyed locks
LCK.backend.locks                     436028686        43.78 Lock Operations
LCK.backend_tcp.creat                         2         0.00 Created locks
LCK.backend_tcp.destroy                       0         0.00 Destroyed locks
LCK.backend_tcp.locks                 839324465        84.26 Lock Operations
LCK.ban.creat                                 1         0.00 Created locks
LCK.ban.destroy                               0         0.00 Destroyed locks
LCK.ban.locks                         485319875        48.72 Lock Operations
LCK.busyobj.creat                     275769226        27.69 Created locks
LCK.busyobj.destroy                   275772300        27.69 Destroyed locks
LCK.busyobj.locks                   45764668121      4594.59 Lock Operations
LCK.cli.creat                                 1         0.00 Created locks
LCK.cli.destroy                               0         0.00 Destroyed locks
LCK.cli.locks                           3319775         0.33 Lock Operations
LCK.exp.creat                                 1         0.00 Created locks
LCK.exp.destroy                               0         0.00 Destroyed locks
LCK.exp.locks                        1601937269       160.83 Lock Operations
LCK.hcb.creat                                 1         0.00 Created locks
LCK.hcb.destroy                               0         0.00 Destroyed locks
LCK.hcb.locks                         418989369        42.06 Lock Operations
LCK.lru.creat                                 2         0.00 Created locks
LCK.lru.destroy                               0         0.00 Destroyed locks
LCK.lru.locks                         658492325        66.11 Lock Operations
LCK.mempool.creat                             5         0.00 Created locks
LCK.mempool.destroy                           0         0.00 Destroyed locks
LCK.mempool.locks                    1539542397       154.56 Lock Operations
LCK.objhdr.creat                      210205399        21.10 Created locks
LCK.objhdr.destroy                    208843468        20.97 Destroyed locks
LCK.objhdr.locks                     4042638826       405.87 Lock Operations
LCK.pipestat.creat                            1         0.00 Created locks
LCK.pipestat.destroy                          0         0.00 Destroyed locks
LCK.pipestat.locks                            0         0.00 Lock Operations
LCK.sess.creat                        230305038        23.12 Created locks
LCK.sess.destroy                      230306744        23.12 Destroyed locks
LCK.sess.locks                        716377977        71.92 Lock Operations
LCK.vbe.creat                                 1         0.00 Created locks
LCK.vbe.destroy                               0         0.00 Destroyed locks
LCK.vbe.locks                           7305124         0.73 Lock Operations
LCK.vcapace.creat                             1         0.00 Created locks
LCK.vcapace.destroy                           0         0.00 Destroyed locks
LCK.vcapace.locks                             0         0.00 Lock Operations
LCK.vcl.creat                                 1         0.00 Created locks
LCK.vcl.destroy                               0         0.00 Destroyed locks
LCK.vcl.locks                         424850700        42.65 Lock Operations
LCK.vxid.creat                                1         0.00 Created locks
LCK.vxid.destroy                              0         0.00 Destroyed locks
LCK.vxid.locks                           265345         0.03 Lock Operations
LCK.waiter.creat                              2         0.00 Created locks
LCK.waiter.destroy                            0         0.00 Destroyed locks
LCK.waiter.locks                      820192788        82.34 Lock Operations
LCK.wq.creat                                  3         0.00 Created locks
LCK.wq.destroy                                0         0.00 Destroyed locks
LCK.wq.locks                         1514678352       152.07 Lock Operations
LCK.wstat.creat                               1         0.00 Created locks
LCK.wstat.destroy                             0         0.00 Destroyed locks
LCK.wstat.locks                       889992298        89.35 Lock Operations
MEMPOOL.busyobj.live                          4          .   In use
MEMPOOL.busyobj.pool                         10          .   In Pool
MEMPOOL.busyobj.sz_wanted                 65536          .   Size requested
MEMPOOL.busyobj.sz_actual                 65504          .   Size allocated
MEMPOOL.busyobj.allocs                210381121        21.12 Allocations
MEMPOOL.busyobj.frees                 210381117        21.12 Frees
MEMPOOL.busyobj.recycle               210069371        21.09 Recycled from
pool
MEMPOOL.busyobj.timeout                 6923652         0.70 Timed out from
pool
MEMPOOL.busyobj.toosmall                      0         0.00 Too small to
recycle
MEMPOOL.busyobj.surplus                   11360         0.00 Too many for
pool
MEMPOOL.busyobj.randry                   311750         0.03 Pool ran dry
MEMPOOL.req0.live                             3          .   In use
MEMPOOL.req0.pool                            11          .   In Pool
MEMPOOL.req0.sz_wanted                    65536          .   Size requested
MEMPOOL.req0.sz_actual                    65504          .   Size allocated
MEMPOOL.req0.allocs                   142052697        14.26 Allocations
MEMPOOL.req0.frees                    142052694        14.26 Frees
MEMPOOL.req0.recycle                  142028490        14.26 Recycled from
pool
MEMPOOL.req0.timeout                    6143066         0.62 Timed out from
pool
MEMPOOL.req0.toosmall                         0         0.00 Too small to
recycle
MEMPOOL.req0.surplus                       1959         0.00 Too many for
pool
MEMPOOL.req0.randry                       24207         0.00 Pool ran dry
MEMPOOL.sess0.live                            8          .   In use
MEMPOOL.sess0.pool                           10          .   In Pool
MEMPOOL.sess0.sz_wanted                     512          .   Size requested
MEMPOOL.sess0.sz_actual                     480          .   Size allocated
MEMPOOL.sess0.allocs                  115153369        11.56 Allocations
MEMPOOL.sess0.frees                   115153361        11.56 Frees
MEMPOOL.sess0.recycle                 115143094        11.56 Recycled from
pool
MEMPOOL.sess0.timeout                   6113104         0.61 Timed out from
pool
MEMPOOL.sess0.toosmall                        0         0.00 Too small to
recycle
MEMPOOL.sess0.surplus                      2787         0.00 Too many for
pool
MEMPOOL.sess0.randry                      10275         0.00 Pool ran dry
LCK.smf.creat                                 1         0.00 Created locks
LCK.smf.destroy                               0         0.00 Destroyed locks
LCK.smf.locks                        3920420601       393.59 Lock Operations
SMF.s0.c_req                         2064438640       207.26 Allocator
requests
SMF.s0.c_fail                         195848371        19.66 Allocator
failures
SMF.s0.c_bytes                     26626288971776   2673175.64 Bytes
allocated
SMF.s0.c_freed                     26445825654784   2655057.83 Bytes freed
SMF.s0.g_alloc                         12608309          .   Allocations
outstanding
SMF.s0.g_bytes                     180463316992          .   Bytes
outstanding
SMF.s0.g_space                       2072793088          .   Bytes available
SMF.s0.g_smf                           12977499          .   N struct smf
SMF.s0.g_smf_frag                        369190          .   N small free
smf
SMF.s0.g_smf_large                            0          .   N large free
smf
LCK.sma.creat                                 1         0.00 Created locks
LCK.sma.destroy                               0         0.00 Destroyed locks
LCK.sma.locks                         263980168        26.50 Lock Operations
SMA.Transient.c_req                   131990084        13.25 Allocator
requests
SMA.Transient.c_fail                          0         0.00 Allocator
failures
SMA.Transient.c_bytes               59316019413      5955.10 Bytes allocated
SMA.Transient.c_freed               59316019413      5955.10 Bytes freed
SMA.Transient.g_alloc                         0          .   Allocations
outstanding
SMA.Transient.g_bytes                         0          .   Bytes
outstanding
SMA.Transient.g_space                         0          .   Bytes available
MEMPOOL.req1.live                             1          .   In use
MEMPOOL.req1.pool                            10          .   In Pool
MEMPOOL.req1.sz_wanted                    65536          .   Size requested
MEMPOOL.req1.sz_actual                    65504          .   Size allocated
MEMPOOL.req1.allocs                   142089847        14.27 Allocations
MEMPOOL.req1.frees                    142089846        14.27 Frees
MEMPOOL.req1.recycle                  142065460        14.26 Recycled from
pool
MEMPOOL.req1.timeout                    6145611         0.62 Timed out from
pool
MEMPOOL.req1.toosmall                         0         0.00 Too small to
recycle
MEMPOOL.req1.surplus                       2350         0.00 Too many for
pool
MEMPOOL.req1.randry                       24387         0.00 Pool ran dry
MEMPOOL.sess1.live                            7          .   In use
MEMPOOL.sess1.pool                           11          .   In Pool
MEMPOOL.sess1.sz_wanted                     512          .   Size requested
MEMPOOL.sess1.sz_actual                     480          .   Size allocated
MEMPOOL.sess1.allocs                  115155274        11.56 Allocations
MEMPOOL.sess1.frees                   115155267        11.56 Frees
MEMPOOL.sess1.recycle                 115144847        11.56 Recycled from
pool
MEMPOOL.sess1.timeout                   6114536         0.61 Timed out from
pool
MEMPOOL.sess1.toosmall                        0         0.00 Too small to
recycle
MEMPOOL.sess1.surplus                      2953         0.00 Too many for
pool
MEMPOOL.sess1.randry                      10427         0.00 Pool ran dry
VBE.reload_2024-06-20T181903.node66.happy 18446744073709551615          .
Happy health probes
VBE.reload_2024-06-20T181903.node66.bereq_hdrbytes    193871874
 19.46 Request header bytes
VBE.reload_2024-06-20T181903.node66.bereq_bodybytes            0
0.00 Request body bytes
VBE.reload_2024-06-20T181903.node66.beresp_hdrbytes     65332553
6.56 Response header bytes
VBE.reload_2024-06-20T181903.node66.beresp_bodybytes  40260910590
 4042.04 Response body bytes
VBE.reload_2024-06-20T181903.node66.pipe_hdrbytes               0
0.00 Pipe request header bytes
VBE.reload_2024-06-20T181903.node66.pipe_out                    0
0.00 Piped bytes to backend
VBE.reload_2024-06-20T181903.node66.pipe_in                     0
0.00 Piped bytes from backend
VBE.reload_2024-06-20T181903.node66.conn                        1
 .   Concurrent connections to backend
VBE.reload_2024-06-20T181903.node66.req                    247959
0.02 Backend requests sent
VBE.reload_2024-06-20T181903.node67.happy            18446744073709551615
       .   Happy health probes
VBE.reload_2024-06-20T181903.node67.bereq_hdrbytes      193960668
 19.47 Request header bytes
VBE.reload_2024-06-20T181903.node67.bereq_bodybytes             0
0.00 Request body bytes
VBE.reload_2024-06-20T181903.node67.beresp_hdrbytes      65315238
6.56 Response header bytes
VBE.reload_2024-06-20T181903.node67.beresp_bodybytes  40142940116
 4030.19 Response body bytes
VBE.reload_2024-06-20T181903.node67.pipe_hdrbytes               0
0.00 Pipe request header bytes
VBE.reload_2024-06-20T181903.node67.pipe_out                    0
0.00 Piped bytes to backend
VBE.reload_2024-06-20T181903.node67.pipe_in                     0
0.00 Piped bytes from backend
VBE.reload_2024-06-20T181903.node67.conn                        3
 .   Concurrent connections to backend
VBE.reload_2024-06-20T181903.node67.req                    247956
0.02 Backend requests sent

*Thanks & Regards,*
*Uday Kumar*


On Thu, Jun 20, 2024 at 11:36 PM Guillaume Quintard <
guillaume.quintard at gmail.com> wrote:

> Hi Uday,
>
> pz_list and recv_queue are not (to my knowledge) Varnish counters, where
> are you seeing them?
>
> I doubt Varnish is actually replying with 0, so that probably is your
> client faking a response code to have something to show. But that's a
> detail, as the unresponsiveness is real.
>
> Could you share a "varnishstat -1" of the impacted machine?
>
> --
> Guillaume Quintard
>
>
> On Thu, Jun 20, 2024 at 9:30 AM Uday Kumar <uday.polu at indiamart.com>
> wrote:
>
>> Hello all,
>>
>> We are facing frequent issues of varnish unresponsiveness for sometime on
>> our production server.
>>
>> During this time we have seen that pz_list is being increased to ~3000
>> and recv_queue is increased to ~130
>> Also, varnish is responding with response code '0' for sometime, which
>> meant unresponsive.
>>
>> This is causing multiple 5xx on front ends.
>>
>> FYR:
>> User request count during this time is normal.
>>
>> Note:
>> During this time, we have confirmed that our backend servers are healthy
>> without any issues.
>>
>>
>> May I know what could be the reason for this behaviour at varnish?
>>
>> Please give me the direction on how to debug this issue.
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20240621/7c9a4c2a/attachment-0001.html>


More information about the varnish-misc mailing list