[Varnish] #1823: vcl in discarded mode does not clear up
Varnish
varnish-bugs at varnish-cache.org
Mon Dec 7 16:14:52 CET 2015
#1823: vcl in discarded mode does not clear up
-----------------------------+--------------------
Reporter: hamed.gholamian | Owner:
Type: defect | Status: new
Priority: normal | Milestone:
Component: varnishd | Version: 4.1.0
Severity: normal | Resolution:
Keywords: |
-----------------------------+--------------------
Comment (by lochii):
{{{
[12:37] <ruben_varnish:#varnish-hacking> right
[12:37] <martin:#varnish-hacking> phk: to me the cause of #1823 isn't
mapped yet. I do not believe it is the SES_Reschedule_Req() that fails and
stops the propagation
[12:37] -beb:#varnish-hacking- #1823: vcl in discarded mode does not clear
up [new] https://www.varnish-cache.org/trac/ticket/1823
[12:40] <martin:#varnish-hacking> and the single DerefObjcore on any
objcore of an objhead should be sufficient to cause the entire list to be
rescheduled
[12:40] <phk:#varnish-hacking> martin, not if we have a "leak" somewhere
that stops the chain.
[12:41] <martin:#varnish-hacking> phk: exactly, so i still believe there
must be a leak somewhere that we haven't spotted
[12:41] <martin:#varnish-hacking> the patch as it is would prob just mask
it
[12:41] <phk:#varnish-hacking> martin, agree. But now we know that to be
the case.
[12:42] <martin:#varnish-hacking> yes, so that is definitive progress
[12:42] <phk:#varnish-hacking> agreed too, I don't think that's the proper
solution. It may still be a good idea to speed up draining the waiting
list though, but it's not a proper fix for the bug.
[12:43] <martin:#varnish-hacking> my huch of friday that I was following
was the single OC ref held by the busyobj, and somehow the initiating
client worker and the fetch worker ref of the busyobj
got screwed up
[12:44] <martin:#varnish-hacking> causing that ref to sit idle
[12:44] <phk:#varnish-hacking> martin, Well, that would be quick to test
if we have a VTC.
[12:45] <martin:#varnish-hacking> yeah
[12:45] <phk:#varnish-hacking> I think we should focus on writing the
VTC...
[12:45] <lochii:#varnish-hacking> martin: re: 'single DerefObjcore' , how
should that work?
[12:45] <lochii:#varnish-hacking> ultimately, it just calls a normal
hsh_rush()
[12:45] <lochii:#varnish-hacking> it isn't doing anything special
[12:45] <martin:#varnish-hacking> which in turn would wake up some thread
somewhere
[12:45] <lochii:#varnish-hacking> yes, it does
[12:45] <lochii:#varnish-hacking> which calls another hsh_rush()
[12:45] <martin:#varnish-hacking> which would have a ref of some objcore
on the same objhead
[12:46] <lochii:#varnish-hacking> but not enough times
[12:46] <lochii:#varnish-hacking> if the wl is big enough
[12:46] <phk:#varnish-hacking> lochii, the point is that "big enough"
shouldn't matter.
[12:46] <martin:#varnish-hacking> which would in turn cause hsh_rush to be
called on the WL of the objhead when it's dereferenced
[12:46] Notification successfully posted.
[12:46] <martin:#varnish-hacking> so the chain is propagated
[12:47] <lochii:#varnish-hacking> right, but the hsh_rush only removes
$rush_exponent items
[12:47] <lochii:#varnish-hacking> from the wl
[12:47] <lochii:#varnish-hacking> in this case, 3
[12:47] <martin:#varnish-hacking> true, but for each of those
$rush_exponent more should be woken
[12:48] <phk:#varnish-hacking> lochii, but each of those should also
remove 3
[12:48] Notification successfully posted.
[12:48] <phk:#varnish-hacking> lochii, it's supposed to be a cascade, and
somewhere it isn't
[12:48] Notification successfully posted.
[12:48] <lochii:#varnish-hacking> hsh_rush() does
wrk->stats->busy_wakeup++;
[12:48] <lochii:#varnish-hacking> but that's about it
[12:50] <lochii:#varnish-hacking> how does (or should) this wakeup work?
[12:50] <martin:#varnish-hacking> lochii: it calls SES_Reschedule_Req()
[12:50] Notification successfully posted.
[12:50] <lochii:#varnish-hacking> yes
[12:50] <martin:#varnish-hacking> that causes a req to be processed that
is already waiting on this objhead
[12:51] <lochii:#varnish-hacking> ok, so that puts a SES_Proto_Req on the
task list
[12:51] <martin:#varnish-hacking> so it must get an objcore ref on the
same objhead
[12:51] <martin:#varnish-hacking> which when deref'ed would call hsh_rush
again
12:51] <martin:#varnish-hacking> so if those statements are all true, the
list would not stall
[12:52] <lochii:#varnish-hacking> I only ever see 25 SES_Reschedule_Req()s
being called
[12:52] <lochii:#varnish-hacking> for 1000 requests
[12:52] <martin:#varnish-hacking> yeah, and that is wrong
[12:52] <lochii:#varnish-hacking> sorry, 300
[12:52] <phk:#varnish-hacking> are there any failures to get worker
threads ?
[12:53] <phk:#varnish-hacking> that might break the chain
[12:53] <martin:#varnish-hacking> the stats are negative on that
[12:53] <lochii:#varnish-hacking> well, if the reschedule fails, it is
supposed to ditch the wl
[12:54] <lochii:#varnish-hacking> since you cascade the return of
pool_task
[12:57] <lochii:#varnish-hacking> when it calls SES_Proto_Req ->
HTTP1_Session , it ends up in S_STP_H1BUSY
[12:58] <lochii:#varnish-hacking> and this does in turn HSH_DerefObjHead()
and Req_Cleanup()
[12:58] <martin:#varnish-hacking> ah
[12:58] <martin:#varnish-hacking> there you have it
[12:58] <martin:#varnish-hacking> hsh_derefobjhead wouldn't call hsh_ursh
[12:59] <lochii:#varnish-hacking> yes
[12:59] <lochii:#varnish-hacking> it doesn't
[12:59] <lochii:#varnish-hacking> I dont know why
[12:59] <lochii:#varnish-hacking> but I've just been guessing by staring
at this code for so long, as to how it is supposed to work
[13:00] <martin:#varnish-hacking> right - so that's the broken link
[13:00] <lochii:#varnish-hacking> ok
[13:00] <martin:#varnish-hacking> the VTCP_check_hup notices the client
has gone away while on the waitinglist
[13:00] <lochii:#varnish-hacking> yes, I saw that
[13:00] <martin:#varnish-hacking> which causes a quick escape of the
handling
[13:00] <martin:#varnish-hacking> which causes a quick escape of the
handling
[13:00] <lochii:#varnish-hacking> wondering why when epoll waiter is
enabled that we don't add epoll monitoring for these sockets
[13:03] <lochii:#varnish-hacking> ok, testing now with hsh_rush() in the
HSH_DerefObjHead() path
[13:06] <phk:#varnish-hacking> lochii, the waiting list predates the
general waiter
[13:06] Notification successfully posted.
[13:06] <phk:#varnish-hacking> lochii, it's also not obvious that it would
be cheaper.
[13:06] Notification successfully posted.
[13:07] [clara] [nan0r(~ronank at 195.8.70.15)] not coming to the team
meeting ?
[13:07] Notification successfully posted.
[13:09] [clara] [msg(nan0r)] incident with l3, be there shortly
[13:10] <lochii:#varnish-hacking> ok, seems to be working , albeit slowly
[13:11] <lochii:#varnish-hacking> will keep an eye on it
[13:11] <phk:#varnish-hacking> lochii, slowly in what way ?
[13:11] Notification successfully posted.
[13:11] <lochii:#varnish-hacking> as in, the refcnt is draining
[13:11] <lochii:#varnish-hacking> as the cascade is happening
[13:11] <lochii:#varnish-hacking> and threads are being woken up
[13:11] <lochii:#varnish-hacking> just happening really slowly
[13:12] <phk:#varnish-hacking> lochii so the cascade is not widening ?
[13:12] Notification successfully posted.
[13:12] <lochii:#varnish-hacking> no
[13:12] <lochii:#varnish-hacking> ok, its stuck again
[13:13] <lochii:#varnish-hacking> ah
[13:13] <lochii:#varnish-hacking> objcore refcnt is now zero
[13:13] <lochii:#varnish-hacking> so we are no longer deref objcore
[13:14] <lochii:#varnish-hacking> which means we stopped deref objhead
[13:14] <lochii:#varnish-hacking> but I"ve still got the oh with refcnt of
800+
[13:14] <lochii:#varnish-hacking> so its stalled here now
[13:16] <phk:#varnish-hacking> that sounds strange...
[14:50] <lochii:#varnish-hacking> ok, looks like I've got it working
[14:51] <lochii:#varnish-hacking> if hsh_rush() is called properly in
HSH_DerefObjHead() it does indeed work
[14:51] <lochii:#varnish-hacking> I need to test some stuff with the
private oh
[14:52] <lochii:#varnish-hacking> but generally it works
[14:52] <lochii:#varnish-hacking> so let me re-make the patch
[14:52] <lochii:#varnish-hacking> (though it would be nice to see ditch
functionality, I'll keep it out of this if it isn't needed)
another patch follows.
}}}
--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1823#comment:12>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator
More information about the varnish-bugs
mailing list