only 503s after a few hours (trunk-4384)

Fernando Papa fernando.papa at liveinteractive.net
Wed Dec 9 14:51:04 CET 2009


Happened again.  This time I left varnishlog running, and got...

  989 TxStatus     c 503
  989 TxResponse   c Service Unavailable
  989 TxHeader     c Server: Varnish
  989 TxHeader     c Date: Wed, 09 Dec 2009 00:38:07 GMT
  989 TxHeader     c Age: 0
  989 TxHeader     c Connection: close
  989 ReqEnd       c 1933950257 1260319087.004358053 1260319087.004553556 
0.000067711 0.000121117 0.000074387
  989 SessionClose c error
  989 StatSess     c 189.165.32.95 1480 0 1 1 0 0 0 117 0
    5 Debug        - "Too many open files when accept(2)ing. Sleeping."
<EOF>

That "Too many..." debug message appears 709 times.  First one:

<line number 14292239>
 1023 TxProtocol   c HTTP/1.1
 1023 TxStatus     c 200
 1023 TxResponse   c OK
 1023 TxHeader     c Server: xxx
 1023 TxHeader     c Cache-Control: public, max-age=3600
 1023 TxHeader     c Content-Type: application/x-shockwave-flash
 1023 TxHeader     c Content-Length: 659
 1023 TxHeader     c Date: Wed, 09 Dec 2009 00:10:05 GMT
 1023 TxHeader     c Age: 972
 1023 TxHeader     c Connection: keep-alive
 1023 ReqEnd       c 1933931697 1260317405.392798424 1260317405.392909527 
0.000062466 0.000048637 0.000062466
  706 Debug        c "herding"
  695 Debug        c "herding"
 1007 Debug        c "herding"
 1023 Debug        c "herding"
    5 Debug        - "Too many open files when accept(2)ing. Sleeping."
  937 ReqStart     c 190.13.3.173 51259 1933931698
  937 RxRequest    c GET
  937 RxURL        c xxx
  937 RxProtocol   c HTTP/1.1
  937 RxHeader     c Accept: */*
  937 RxHeader     c Referer: xxx
  937 RxHeader     c Accept-Language: es-co
  937 RxHeader     c UA-CPU: x86
  937 RxHeader     c Accept-Encoding: gzip, deflate
  937 RxHeader     c User-Agent: xxx
  937 RxHeader     c Host: xxx
  937 RxHeader     c Connection: Keep-Alive
  937 RxHeader     c Cookie: xxx=xxx
  937 VCL_call     c recv
  937 VCL_return   c lookup
  937 VCL_call     c hash
  937 VCL_return   c hash
  937 VCL_call     c miss
  937 VCL_return   c fetch
  937 FetchError   c no backend connection
  937 VCL_call     c error
  937 VCL_return   c deliver

Varnish account does not have limits (ulimit returns unlimited).  Memory
usage (top): 403m/89m/81m.

# ps -AF | grep varnish
root     19223     1  0 21419   928   2 Dec08 ?        00:00:00 varnishd -a 
*:80 -f xxx.vcl -p obj_workspace 4096 -s malloc,100M -p thread_pools 4
nobody   19224 19223  0 103378 91244  3 Dec08 ?        00:06:52 varnishd -a 
*:80 -f xxx.vcl -p obj_workspace 4096 -s malloc,100M -p thread_pools 4

# kill 19224

# ps -AF | grep varnish
root     19223     1  0 21419   960   2 Dec08 ?        00:00:00 varnishd -a 
*:80 -f xxx.vcl -p obj_workspace 4096 -s malloc,100M -p thread_pools 4
nobody   31319 19223  1 94673  3532   3 12:33 ?        00:00:00 varnishd -a 
*:80 -f xxx.vcl -p obj_workspace 4096 -s malloc,100M -p thread_pools 4

That made it work again --as in processing requests--, except for varnishlog 
that continues to show nothing.

----- Original Message ----- 
From: "Fernando Papa" <fernando.papa at liveinteractive.net>
To: <varnish-misc at projects.linpro.no>
Sent: Tuesday, December 08, 2009 2:01 PM
Subject: only 503s after a few hours (trunk-4384)


> Hello,
>
> After a few hours, Varnish trunk-4384 started replying with 503.
> New work-load, first day of use.  Previously used Varnish only with 
> internal
> traffic (eg no crazy headers, just carefully crafted requests).
>
> Facts:
>
> * Xeon 2.4 GHz (quad-core, 512KBx4 cache) server.  CentOS 5.4.
> * Worked fine for ~9 hours, sending to backend 150+ K requests plus 
> Varnish
> pings.
> * Backend is an application server.  Payload is <1 KB plus minimal 
> headers.
> * I can still access backend bypassing Varnish.
> * Varnish continues pinging backend, and receiving a timely 200 OK every
> time (backend logs).
> * req.backend.healthy = true.  Tested with a codepath along the lines
> (req.request == "GET") && (req.url == "/") && (req.backend.healthy), that
> does an 'error 200 "OK"'.
> * varnishlog does not output anything at all.  Not even the backend pings.
> * Tested every path within vcl_recv.  Exiting with "error" works, but
> "lookup" results in a 503.
> * Very low CPU usage.  Sorry, didn't check memory usage before restarting
> it.
>
> CFLAGS='-O2 -fomit-frame-pointer -march=pentium4 -mfpmath=sse'
> ./configure --prefix=$HOME/local --disable-largefile
>
> $HOME/local/sbin/varnishd -a *:80 -f $HOME/local/etc/varnish/xxx.vcl -p
> obj_workspace=4096 -s malloc,100M -p thread_pools=`grep -c 'model name'
> /proc/cpuinfo`
>
> backend fd1
> {
>   .host="localhost";
>   .port="85";
>   .connect_timeout = 1s;
>   .first_byte_timeout = 20s;
>   .between_bytes_timeout =2s;
>   .probe = {
>      .request =
>         "GET /xxx HTTP/1.1"
>         "Host: yyy"
>         "Connection: close";
>      .timeout = 0.5 s;
>      .window = 8;
>      .threshold = 3;
>   }
> }
>
> director fdcluster round-robin
> {
>   { .backend = fd1; }
> }
>
> Restarted Varnish, now it works again.
>
> Any idea?  Anything wrong in the command line?
>
> Thanks,
> Fernando.
>




More information about the varnish-misc mailing list