[master] 8dbb9ab23 some advise on working with core files and stack overflows
Nils Goroll
nils.goroll at uplex.de
Thu Dec 19 11:47:07 UTC 2019
commit 8dbb9ab2352dd47e634d88b505254602ee874616
Author: Nils Goroll <nils.goroll at uplex.de>
Date: Thu Dec 19 12:44:36 2019 +0100
some advise on working with core files and stack overflows
as promised in #3161
diff --git a/doc/sphinx/users-guide/troubleshooting.rst b/doc/sphinx/users-guide/troubleshooting.rst
index 92ed87ced..46e5459fd 100644
--- a/doc/sphinx/users-guide/troubleshooting.rst
+++ b/doc/sphinx/users-guide/troubleshooting.rst
@@ -81,6 +81,24 @@ The crash might be due to misconfiguration or a bug. If you suspect it
is a bug you can use the output in a bug report, see the "Trouble
Tickets" section in the Introduction chapter above.
+Varnish is crashing - stack overflows
+-------------------------------------
+
+Bugs put aside, the most likely cause of crashes are stack overflows,
+which is why we have added a heuristic to add a note when a crash
+looks like it was caused by one. In this case, the panic message
+contains something like this::
+
+ Signal 11 (Segmentation fault) received at 0x7f631f1b2f98 si_code 1
+ THIS PROBABLY IS A STACK OVERFLOW - check thread_pool_stack parameter
+
+as a first measure, please follow this advise and check if crashes
+still occur when you add 128k to whatever the value of the
+``thread_pool_stack`` parameter and restart varnish.
+
+If varnish stops crashing with a larger ``thread_pool_stack``
+parameter, it's not a bug (at least most likely).
+
Varnish is crashing - segfaults
-------------------------------
@@ -93,12 +111,66 @@ debug a segfault the developers need you to provide a fair bit of
data.
* Make sure you have Varnish installed with debugging symbols.
- * Make sure core dumps are allowed in the parent shell::
-
- ulimit -c unlimited
+ * Check where your operating system writes core files and ensure that
+ you actually get them. For example on linux, learn about
+ ``/proc/sys/kernel/core_pattern`` from the `core(5)` manpage.
+ * Make sure core dumps are allowed in the parent shell from which
+ varnishd is being started. In shell, this would be::
+
+ ulimit -c unlimited
+
+ but if varnish is started from an init-script, that would need to
+ be adjusted or in the case of systemd, ``LimitCORE=infinity`` set
+ in the service's ``[Service]]`` section of the unit file.
+
+Once you have the core, ``cd`` into varnish's working directory (as
+given by the ``-n`` parameter, whose default is
+``$PREFIX/var/varnish/$HOSTNAME`` with ``$PREFIX`` being the
+installation prefix, usually ``/usr/local``, open the core with
+``gdb`` and issue the command ``bt`` to get a stack trace of the
+thread that caused the segfault.
+
+A basic debug session for varnish installed under ``/usr/local`` could look
+like this::
+
+ $ cd /usr/local/var/varnish/`uname -n`/
+ $ gdb /usr/local/sbin/varnishd core
+ GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
+ Copyright (C) 2016 Free Software Foundation, Inc.
+ [...]
+ Core was generated by `/usr/local/sbin/varnishd -a 127.0.0.1:8080 -b 127.0.0.1:8080'.
+ Program terminated with signal SIGABRT, Aborted.
+ #0 __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
+ 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
+ [Current thread is 1 (Thread 0x7f7749ea3700 (LWP 31258))]
+
+ (gdb) bt
+ #0 __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
+ #1 0x00007f775132342a in __GI_abort () at abort.c:89
+ #2 0x000000000045939f in pan_ic (func=0x7f77439fb811 "VCL", file=0x7f77439fb74c "", line=0,
+ cond=0x7f7740098130 "PANIC: deliberately!", kind=VAS_VCL) at cache/cache_panic.c:839
+ #3 0x0000000000518cb1 in VAS_Fail (func=0x7f77439fb811 "VCL", file=0x7f77439fb74c "", line=0,
+ cond=0x7f7740098130 "PANIC: deliberately!", kind=VAS_VCL) at vas.c:51
+ #4 0x00007f77439fa6e9 in vmod_panic (ctx=0x7f7749ea2068, str=0x7f7749ea2018) at vmod_vtc.c:109
+ #5 0x00007f77449fa5b8 in VGC_function_vcl_recv (ctx=0x7f7749ea2068) at vgc.c:1957
+ #6 0x0000000000491261 in vcl_call_method (wrk=0x7f7749ea2dd0, req=0x7f7740096020, bo=0x0,
+ specific=0x0, method=2, func=0x7f77449fa550 <VGC_function_vcl_recv>) at cache/cache_vrt_vcl.c:462
+ #7 0x0000000000493025 in VCL_recv_method (vcl=0x7f775083f340, wrk=0x7f7749ea2dd0, req=0x7f7740096020,
+ bo=0x0, specific=0x0) at ../../include/tbl/vcl_returns.h:192
+ #8 0x0000000000462979 in cnt_recv (wrk=0x7f7749ea2dd0, req=0x7f7740096020) at cache/cache_req_fsm.c:880
+ #9 0x0000000000461553 in CNT_Request (req=0x7f7740096020) at ../../include/tbl/steps.h:36
+ #10 0x00000000004a7fc6 in HTTP1_Session (wrk=0x7f7749ea2dd0, req=0x7f7740096020)
+ at http1/cache_http1_fsm.c:417
+ #11 0x00000000004a72c3 in http1_req (wrk=0x7f7749ea2dd0, arg=0x7f7740096020)
+ at http1/cache_http1_fsm.c:86
+ #12 0x0000000000496bb6 in Pool_Work_Thread (pp=0x7f774980e140, wrk=0x7f7749ea2dd0)
+ at cache/cache_wrk.c:406
+ #13 0x00000000004963e3 in WRK_Thread (qp=0x7f774980e140, stacksize=57344, thread_workspace=2048)
+ at cache/cache_wrk.c:144
+ #14 0x000000000049610b in pool_thread (priv=0x7f774880ec80) at cache/cache_wrk.c:439
+ #15 0x00007f77516954a4 in start_thread (arg=0x7f7749ea3700) at pthread_create.c:456
+ #16 0x00007f77513d7d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
-Once you have the core you open it with `gdb` and issue the command ``bt``
-to get a stack trace of the thread that caused the segfault.
Varnish gives me Guru meditation
More information about the varnish-commit
mailing list