Ticket #650 (closed defect: worksforme)

Opened 5 months ago

Last modified 5 months ago

Varnish probes and nginx + FastCGI don't get on together.

Reported by: t0m Owned by: phk
Priority: normal Milestone:
Component: varnishd Version: trunk
Severity: major Keywords:
Cc:

Description

It seems to be well known (but somewhat obscure - took some effort to convince out of google) that varnish health checks don't play well with nginx when it is proxying content or serving FastCGI:

E.g.

 http://www.docunext.com/wiki/Varnish#Working_VCL_Failover_Example

 http://sys-notes.com/bin/view/Main/NginxProblems

This is as varnish closes the request socket before nginx starts sending a response, ergo nginx returns a 499 error code.

I could use the workaround given in the first example, however this makes the varnish health checking useless to me - as the fact that nginx is up means nothing to me without the FCGI application it is serving being available.

If it is felt that this behavior is incorrect on the part of nginx, then this should be documented to stop people falling into this trap. Otherwise, the connection varnish makes should be kept open until data is being recieved from the backend (polled) web server, or the timeout is reached.

Attachments

varnishstat-button-2010022301 Download (10.8 KB) - added by t0m 5 months ago.
varnishstat -x on my server using 100% CPU with trunk varnish (r4585)
button-test.vcl Download (0.8 KB) - added by t0m 5 months ago.
Part 1 of my varnish config causing 100% CPU
generic.vcl Download (5.1 KB) - added by t0m 5 months ago.
Part 2 of my varnish config causing 100% CPU

Change History

Changed 5 months ago by phk

Which varnish version is this ? I was under the impression this was fixed in -trunk ?

Changed 5 months ago by t0m

I just tried trunk, however I didn't get as far as testing this as trunk currently uses 100% CPU when idle with my configuration/build.

I'll follow up with some info about what's actually going on once I've done some more investigation.

Changed 5 months ago by t0m

strace doesn't show anything interesting when varnish is consuming 100% of CPU:

button varnish $ sudo strace -p  713
Process 713 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 1
read(10, "ping\n", 8191)                = 5
time(NULL)                              = 1266953031
writev(13, [{"200 19      \n", 13}, {"PONG 1266953031 1.0", 19}, {"\n", 1}], 3) = 33
poll([{fd=10, events=POLLIN, revents=POLLIN}], 1, -1) = 1
read(10, "ping\n", 8191)                = 5
time(NULL)                              = 1266953034
writev(13, [{"200 19      \n", 13}, {"PONG 1266953034 1.0", 19}, {"\n", 1}], 3) = 33
poll([{fd=10, events=POLLIN, revents=POLLIN}], 1, -1) = 1
read(10, "ping\n", 8191)                = 5
time(NULL)                              = 1266953037
writev(13, [{"200 19      \n", 13}, {"PONG 1266953037 1.0", 19}, {"\n", 1}], 3) = 33
poll( <unfinished ...>
Process 713 detached

I'll attach a varnishstat and my config.

Changed 5 months ago by t0m

varnishstat -x on my server using 100% CPU with trunk varnish (r4585)

Changed 5 months ago by t0m

Part 1 of my varnish config causing 100% CPU

Changed 5 months ago by t0m

Part 2 of my varnish config causing 100% CPU

Changed 5 months ago by t0m

Attached. I'm trying r4585 of varnish, built by making slight tweaks to the debian/ directory included via svn:externals. The varnish process appears to perform as expected, despite the massive CPU consumption.

I haven't got as far as testing if the originally described issue is fixed however.

Let me know if I can provide more information as this issue is trivially reproducible on a non-production system.

Changed 5 months ago by phk

  • status changed from new to closed
  • resolution set to worksforme

The CPU usage problem is probably because you hit a half-fix for #644.

And as I read it, -trunk does the right thing for your probes.

2.1 should work great for you when we release it this month :-)

Note: See TracTickets for help on using tickets.