We also are seeing either this exact issue, or one very like it.  
Unfortunately, we only see this problem in production and are unable to 
reproduce it in a test environment like you are.  I've been meaning to 
post, but have yet to get more detailed varnishlog information like you 
have been able to.  Due to these problems, we are about to migrate off 
Varnish very reluctantly, if we cannot resolve them soon.

Varnish works great, until 2, 3, 6, 12 - seems random, hours of 
production use for our application.  When the problem starts, all 
requests appear to "hang".  You can either restart varnish manually at 
that point, to get back to normal operation - or after some time 
(roughly 5-15 minutes, it varies) varnish will actually restart the 
child thread and the services come back to normal operation automatically.

We see this behavior both in 2.0.1 and 2.0.2

Sorry I don't have more information to add, but I figured I would drop a 
"me too" reply since we understand how frustrating this problem can be. 

I am also very willing to hire a varnish developer to diagnose and fix 
this problem, giving them full access to the machines in question.  I 
was actually going to make a post mentioning this, but was going to wait 
until I could collect proper troubleshooting information.

Other than this (admittedly show-stopper) issue, varnish is absolutely 
wonderful.  Total time to learn about it, download, compile, and create 
our custom VCL was maybe 30 minutes.



Grasmo, Johan wrote:
> Hi,
> We're tired of squid and it's quirks and want to advance into the information age with varnish.
> I've been testing varnish the last couple of days and it has been behaving perfectly until last night. When stresstesting the varnishserver it suddenly stopped handling requests. Varnish returns to normal operation after a restart.
> Symptom:
> Accessing a page on the varnishsserver results in a timeout in the browser
> I am able to telnet to the varnishserver on port 80 but a GET doesn't result in anything ("hangs").
> I am able to telnet to port 6082 and run commands:
> [snip]
> status
> 200 22
> Child in state running
> [/snip]
> Varnishlog returns:
> [snip]
>     0 CLI          - Rd ping
>     0 CLI          - Wr 0 200 PONG 1228307253 1.0
>     0 CLI          - Rd ping
>     0 CLI          - Wr 0 200 PONG 1228307256 1.0
>     0 CLI          - Rd ping
> [/snip]
> Varnishncsa doesn't return anything.
> I've verified with tcpdump that varnish ack's requests.
> So "everything seems normal" even though clients get timeout. FF returned "Connection Interrupted The document contains no data". If you need any logs/configs please let me know :)
> I haven't exhausted the search function for the mailinglist so sorry if this topic has been an issue before.
> I'm running varnish 2.0 on an Ubuntu 8.04 distro.
> Cheers and thanks for replies,
> Johan Grasmo
