r4702 - trunk/varnish-cache/doc/sphinx/installation
phk at varnish-cache.org
phk at varnish-cache.org
Wed Apr 21 11:01:56 CEST 2010
Author: phk
Date: 2010-04-21 11:01:56 +0200 (Wed, 21 Apr 2010)
New Revision: 4702
Added:
trunk/varnish-cache/doc/sphinx/installation/bugs.rst
Modified:
trunk/varnish-cache/doc/sphinx/installation/index.rst
Log:
More doc
Added: trunk/varnish-cache/doc/sphinx/installation/bugs.rst
===================================================================
--- trunk/varnish-cache/doc/sphinx/installation/bugs.rst (rev 0)
+++ trunk/varnish-cache/doc/sphinx/installation/bugs.rst 2010-04-21 09:01:56 UTC (rev 4702)
@@ -0,0 +1,124 @@
+%%%%%%%%%%%%%%
+Reporting bugs
+%%%%%%%%%%%%%%
+
+Varnish can be a tricky beast to debug, having potentially thousands
+of threads crowding into a few data structures makes for *interesting*
+core dumps.
+
+Actually, let me rephrase that without irony: You tire of the "no,
+not thread 438 either, lets look at 439 then..." routine really fast.
+
+So if you run into a bug, it is important that you spend a little bit
+of time collecting the right information, to help us fix the bug.
+
+Roughly we have three clases of bugs with Varnish.
+
+Varnish crashes
+===============
+
+Plain and simple: **boom**
+
+Varnish is split over two processes, the manager and the child. The child
+does all the work, and the manager hangs around to resurect it, if it
+crashes.
+
+Therefore, the first thing to do if you see a varnish crash, is to examine
+your syslogs, to see if it has happened before. (One site is rumoured
+to have had varnish restarting every 10 minutes and *still* provide better
+service than their CMS system.)
+
+When it crashes, if at all possible, Varnish will spew out a crash dump
+that looks something like::
+
+ Child (32619) died signal=6 (core dumped)
+ Child (32619) Panic message: Assert error in ccf_panic(), cache_cli.c line 153:
+ Condition(!strcmp("", "You asked for it")) not true.
+ errno = 9 (Bad file descriptor)
+ thread = (cache-main)
+ ident = FreeBSD,9.0-CURRENT,amd64,-sfile,-hcritbit,kqueue
+ Backtrace:
+ 0x42bce1: pan_ic+171
+ 0x4196af: ccf_panic+4f
+ 0x8006b3ef2: _end+80013339a
+ 0x8006b4307: _end+8001337af
+ 0x8006b8b76: _end+80013801e
+ 0x8006b8d84: _end+80013822c
+ 0x8006b51c1: _end+800134669
+ 0x4193f6: CLI_Run+86
+ 0x429f8b: child_main+14b
+ 0x43ef68: start_child+3f8
+ [...]
+
+If you can get that information to us, we are usually able to
+see exactly where things went haywire, and that speeds up bugfixing
+a lot.
+
+There will be a lot more information than this, and before sending
+it all to us, you should obscure any sensitive/secret
+data/cookies/passwords/ip# etc. Please make sure to keep context
+when you do so, ie: do not change all the IP# to "X.X.X.X", but
+change each IP# to something unique, otherwise we are likely to be
+more confused than informed.
+
+The most important line is the "Panic Message", which comes in two
+general forms:
+
+"Missing errorhandling code in ..."
+ This is a place where we can conceive ending up, and have not
+ (yet) written the padded-box error handling code for.
+
+ The most likely cause here, is that you need a larger workspace
+ for HTTP headers and Cookies. (XXX: which params to tweak)
+
+ Please try that before reporting a bug.
+
+"Assert error in ..."
+ This is something bad that should never happen, and a bug
+ report is almost certainly in order. As always, if in doubt
+ ask us on IRC before opening the ticket.
+
+In your syslog it may all be joined into one single line, but if you
+can reproduce the crash, do so while running varnishd manually:
+
+ ``varnishd -d <your other arguments> |& tee /tmp/_catch_bug``
+
+That will get you the entire panic message into a file.
+
+(Remember to type ``start`` to launch the worker process, that is not
+automatic when ``-d`` is used.)
+
+Varnish goes on vacation
+========================
+
+This kind of bug is nasty to debug, because usually people tend to
+kill the process and send us an email saying "Varnish hung, I
+restarted it" which gives us only about 1.01 bit of usable debug
+information to work with.
+
+What we need here is all the information can you squeeze out of
+your operating system **before** you kill the Varnish process.
+
+One of the most valuable bits of information, is if all Varnish'
+threads are waiting for something or if one of them is spinning
+furiously on some futile condition.
+
+Commands like ``top -H`` or ``ps -Haxlw`` or ``ps -efH`` should be
+able to figure that out.
+
+If one or more threads are spinning, use ``strace`` or ``ktrace`` or ``truss``
+(or whatever else your OS provides) to get a trace of which system calls
+the varnish process issues. Be aware that this may generate a lot
+of very repetitive data, usually one second worth is more than enough.
+
+Also, run ``varnishlog`` for a second, and collect the output
+for us, and if ``varnishstat`` shows any activity, capture that also.
+
+When you have done this, kill the Varnish *child* process, and let
+the *master* process restart it. Remember to tell us if that does
+or does not work. If it does not, kill all Varnish processes, and
+start from scratch. If that does not work either, tell us, that
+means that we have wedged your kernel.
+
+
+
Modified: trunk/varnish-cache/doc/sphinx/installation/index.rst
===================================================================
--- trunk/varnish-cache/doc/sphinx/installation/index.rst 2010-04-21 07:48:24 UTC (rev 4701)
+++ trunk/varnish-cache/doc/sphinx/installation/index.rst 2010-04-21 09:01:56 UTC (rev 4702)
@@ -11,16 +11,17 @@
install.rst
help.rst
+ bugs.rst
.. todo::
[V] on this os, pull this package
[V] .. that ..//..
[V] to compile from source
- how to get help
- - mailing list
- - IRC
- - varnish-software.com
- - other listed consultants
+ [V] how to get help
+ [V]- mailing list
+ [V] - IRC
+ [V] - varnish-software.com
+ [V] - other listed consultants
reporting bugs
- using varnishtest to reproduce
- what data do we need
More information about the varnish-commit
mailing list