[Varnish] #51: crash in cache_acceptor_epoll.c
Varnish
varnish-bugs at projects.linpro.no
Thu Oct 5 22:24:50 CEST 2006
#51: crash in cache_acceptor_epoll.c
----------------------+-----------------------------------------------------
Reporter: airmax | Owner: des
Type: defect | Status: assigned
Priority: high | Milestone:
Component: varnishd | Version: 1.0
Severity: blocker | Resolution:
Keywords: |
----------------------+-----------------------------------------------------
Comment (by airmax):
I tried strace. Got some errors messages, then it hangs, I had to kill it
with a SIGKILL. Retried, same thing. I suppose that it's because strace
and threads aren't friends.
But I think I got our murderer : Foundry Iron load balancer and its
probes. The crash occurs while varnishd is serving its probe.
The iron LB check connect to port 80, and sends "HEAD
http://www.diabloteen.com/ HTTP/1.1" to the server. And then, euhm,
foundry knows. But I suspect them of closing the connection in an ugly way
at some point .
Interesting part of the strace is :
{{{
15440 read(12, "HEAD http://www.diabloteen.com/ "..., 4095) = 44
15440 clock_gettime(CLOCK_REALTIME, {1160078181, 3230000}) = 0
15440 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 15
15440 connect(15, {sa_family=AF_INET, sin_port=htons(80),
sin_addr=inet_addr("192.168.4.230")}, 16) = 0
15440 getsockname(15, {sa_family=AF_INET, sin_port=htons(36177),
sin_addr=inet_addr("192.168.4.46")}, [16]) = 0
15440 writev(15, [{"GET", 3}, {" ", 1}, {"http://www.diabloteen.com/",
26}, {" ", 1}, {"HTTP/1.1", 8}, {"\r\n", 2}, {"X-Varnish: 974115635", 20},
{"\r\n", 2}, {"Host: 192.168.4.230", 19}, {"\r\n", 2}, {"\r\n", 2}], 1
1) = 86
15440 read(15, "HTTP/1.1 200 OK\r\nDate: Thu, 05 O"..., 4095) = 1445
15440 time(NULL) = 1160078181
15440 open("/etc/localtime", O_RDONLY) = 16
15440 fstat64(16, {st_mode=S_IFREG|0644, st_size=1082, ...}) = 0
15440 mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x83a1b000
15440 read(16,
"TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\f\0\0\0\f\0"..., 131072) =
1082
15440 close(16) = 0
15440 munmap(0x83a1b000, 131072) = 0
15440 fcntl64(15, F_GETFL) = 0x2 (flags O_RDWR)
15440 fcntl64(15, F_SETFL, O_RDWR) = 0
15440 close(15) = 0
15440 clock_gettime(CLOCK_REALTIME, {1160078181, 13725000}) = 0
15440 writev(12, [{"HTTP/1.1", 8}, {" ", 1}, {"200", 3}, {" ", 1}, {"OK",
2}, {"\r\n", 2}, {"Date: Thu, 05 Oct 2006 19:56:19 "..., 35}, {"\r\n", 2},
{"Server: Apache/1.3.28 (Unix) Aut"..., 62}, {"\r\n", 2}, {"pics-la
bel: (pics-1.1 \"http://ww"..., 220}, {"\r\n", 2}, {"Last-Modified: Fri,
03 Mar 2006 "..., 44}, {"\r\n", 2}, {"ETag: \"26eb95d-3ac-44085a19\"",
28}, {"\r\n", 2}, {"Content-Type: text/html", 23}, {"\r\n", 2}, {"Conten
t-Length: 940", 19}, {"\r\n", 2}, {"X-Varnish: 974115635", 20}, {"\r\n",
2}, {"Age: 0", 6}, {"\r\n", 2}, {"Via: 1.1 varnish", 16}, {"\r\n", 2},
{"\r\n", 2}], 27) = 512
15440 clock_gettime(CLOCK_REALTIME, {1160078181, 13924000}) = 0
15440 write(10, "\304\34\10\10", 4 <unfinished ...>
15438 <... epoll_wait resumed> {{EPOLLIN, {u32=134680252,
u64=134680252}}}, 1, 100) = 1
15440 <... write resumed> ) = 4
15438 read(7, "\304\34\10\10", 4) = 4
15438 epoll_ctl(11, EPOLL_CTL_ADD, 12, {EPOLLIN|EPOLLPRI, {u32=134749380,
u64=134749380}}) = 0
15438 clock_gettime(CLOCK_REALTIME, {1160078181, 14120000}) = 0
15438 epoll_wait(11, <unfinished ...>
15440 read(13, <unfinished ...>
15438 <... epoll_wait resumed> {{EPOLLIN|EPOLLERR|EPOLLHUP,
{u32=134749380, u64=134749380}}}, 1, 100) = 1
15438 read(12, 0x8081f44, 4095) = -1 ECONNRESET (Connection reset
by peer)
15438 close(12) = 0
15438 epoll_ctl(11, EPOLL_CTL_DEL, -1, NULL) = -1 EFAULT (Bad address)
15438 write(2, "Assert error in vca_del(), cache"..., 134 <unfinished ...>
15426 <... poll resumed> [{fd=0, events=POLLIN}, {fd=5, events=POLLIN},
{fd=11, events=POLLIN, revents=POLLIN}], 3, 2225) = 1
15438 <... write resumed> ) = 134
15426 read(11, <unfinished ...>
15438 rt_sigprocmask(SIG_UNBLOCK, [ABRT], <unfinished ...>
15426 <... read resumed> "Assert error in vca_del(), cache"..., 8191) =
134
15438 <... rt_sigprocmask resumed> NULL, 8) = 0
15426 write(2, "Child said (2, 15434): <<Assert "..., 162 <unfinished ...>
15438 tgkill(15434, 15438, SIGABRT <unfinished ...>
15426 <... write resumed> ) = 162
15438 <... tgkill resumed> ) = 0
15426 clock_gettime(CLOCK_MONOTONIC, <unfinished ...>
15438 --- SIGABRT (Aborted) @ 0 (0) ---
}}}
Kernel on this test machine 2.6.8.
{{{
Linux tdell4 2.6.8-2-686 #1 Thu May 19 17:53:30 JST 2005 i686 GNU/Linux
}}}
This kernel is a basic Debian Sarge kernel (kernel-image-2.6.8-2-686
package, version 2.6.8-16).
--
Ticket URL: <http://varnish.projects.linpro.no/ticket/51>
Varnish <http://varnish.projects.linpro.no/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list