Child panics on OpenSolaris

Paul Wright wrighty+varnishmisc at gmail.com
Thu Mar 4 12:53:10 CET 2010


On 22 February 2010 18:02, Paul Wright <wrighty+varnishmisc at gmail.com> wrote:
...
> For anyone else following along I've now had varnish running for over
> 5 hours without issue, here are the things I found out:
>
> * add the Range unsetting code to ensure that such requests don't make
> it through to the back end
> * remove the TCP_Assert() that wraps the setsockopt() call on line 184
> of bin/varnishd/cache_acceptor.c
> * compile with gcc, not Sun Studio (there's still some sort of
> funniness with TCP_(non)blocking() )
>
> CC=/usr/bin/gcc CFLAGS="-O3 -L/lib/amd64 -pthreads -m64
> -fomit-frame-pointer" LDFLAGS="-lumem -pthreads" ./configure
> --prefix=/opt
>
> * pass the right flags through to gcc when launching vanishd
>
> newtask -p highfile /opt/sbin/varnishd -f /opt/etc/varnish/firebox.vcl -F \
> -p 'cc_command=/usr/bin/gcc -fpic -shared -m64 -o %o %s' \
> -T 127.0.0.1:9001 \
> -s malloc,2G \
> -p sess_timeout=5s \
> -p max_restarts=12 \
> -p waiter=poll \
> -p connect_timeout=0s \
> -p sess_workspace=65536
>
> * keep checking http://letsgetdugg.com/2009/12/04/varnish-on-solaris/
> for hints and suggestions

Latest update, we're seeing "Connection refused" panics like the following:

Child (1955) died signal=6
Child (1955) Panic message: Assert error in TCP_blocking(), tcp.c line 164:
  Condition(TCP_Check(j)) not true.
errno = 146 (Connection refused)
thread = (cache-worker)
ident = -smalloc,-hcritbit,poll
Backtrace:
  42fb51: /opt/sbin/varnishd'pan_ic+0xb1 [0x42fb51]
  2f: [0x2f]
sp = 13570018 {
  fd = 47, id = 47, xid = 0,
  client = ?.?.?.?:?,
  step = STP_FIRST,
  handling = deliver,
  restarts = 0, esis = 0
  ws = 13570088 {
    id = "sess",
    {s,f,r,e} = {13570d90,13570d90,0,+65536},
  },
  http[req] = {
    ws = 13570088[sess]
      "",
      "/i/nl/all/0.gif",
      "HTTP/1.1",
      "Host: media.firebox.com",
      "User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2;
en-us) AppleWebKit/531.21.8 (KHTML, like Gecko)",
      "Accept: */*",
      "Accept-Language: en-us",
      "If-Modified-Since: Tue, 22 Sep 2009 15:34:33 GMT",
      "If-None-Match: "16a8070-2b-4742c55c1f440"",
      "Connection: keep-alive",
      "X-Forwarded-For: 85.189.102.193",
  },
  worker = fffffd7ff7bf1d80 {
    ws = fffffd7ff7bf1ec8 {
      id = "wrk",
      {s,f,r,e} = {fffffd7ff7bdfcb0,fffffd7ff7bdfcb0,0,+65536},
    },
    },
},

Interesting things to note, we're confident that this request is a
cache hit which rules out the backend (handling = deliver).  Also the
client address appears to have been mangled:

  client = ?.?.?.?:?,

Would this cause varnish to attempt opening a connection which is then refused?

As a workaround would it be advisable to add a clause to TCP_Check (in
include/libvarnish.h) to skip over errno 146 (Connection refused)
along with the existing ECONNRESET and ENOTCONN clauses?

Cheers,

Paul.



More information about the varnish-misc mailing list