Updated persistent patches

Poul-Henning Kamp phk at phk.freebsd.dk
Wed Jun 6 08:40:18 CEST 2012


In message <CANTn4cr-Fq0zuWPpyQvF=Z-=Y5=K-Amk3+CvsEtBL0Qhw5yvQg at mail.gmail.com>
, Martin Blix Grydeland writes:

----------------------
Subject: [PATCH 01/21] Add a EXP_NukeLRU() function to nuke an entire LRU

+       struct objcore *oc_array[10];
[...]
+               while (n < 10) {

That's a no-go, #define or sizeof please.

You should place both the BINHEAP_NOIDX asserts in the first loop,
(see EXP_NukeOne) it does nothing good outside the lock in the
second loop.

----------------------
Subject: [PATCH 02/21] Add smp_copy_sign() and smp_reapply_sign() utility functions

I can clearly see the utility of these functions, but it bothers
me a lot that encapsulating this functionality further obscures the
fact that when we use signatures as data-containers we have no idea
if we overwrite the allocation the represent.

Adding a "max_length" field to struct smp_sign is undesirable, as
we don't want that included in the signature (it's only relevant
while creating the signature, not while checking it).

I guess this is really a manifestation of the fact that smp_sign should
not be used as a space-handle in the first place, so what we really
should do is add:

	struct smp_space {
		uint64_t	first;
		uint64_t	len;
		uint64_t	last;
		struct smp_sign	sign;
	}

(This is a runtime structure, it's never stored, so it should
*not* go in "include/persistent.h")

And use that as our generic handle for "stick things into this space"

So we'd have:

	void
	smp_copy_space(struct smp_space *dst, struct smp_space *src)
	{
		assert(dst->first + src->len < dst->last);
		memcpy(dst->first, src->first, src->len);
		dst->len = src->len;
		smp_reset_sign(&dst->sign);
		smp_append_sign(&dst->sign, dst->len)
	}
		
----------------------
Subject: [PATCH 03/21] Sync the complete sign area on smp_sync_sign
Subject: [PATCH 21/21] Round to page sizes on smp_sync_sign()

The issue with msync is that both the pointer an length should be
properly rounded to pages before we call it, otherwise it might fail.

We should wrap this in a "smp_msync" function, rather than
pollute all msync calls with it.

Notice that pagesize is the correct thing to round to, not
sc->granularity.

We should clarify the meaning of the smp_sign->length field
in include/persistent.h:  Does it include struct smp_sign or not ?

----------------------
Subject: [PATCH 04/21] Free the LRU object and set free_offset when dropping empty segments in smp_close_seg()

Looks correct.

(When do we close an empty segment, other than shutdown ?)

----------------------
Subject: [PATCH 05/21] Don't exit on full silo

Are you sure you have all the AN(sc->cur_seg) the change of
semantics requires ?

For clarity, always keep left in sync with cur_seg:

        if (left < extra + min_size) {
                if (sc->cur_seg != NULL)
                        smp_close_seg(sc, sc->cur_seg);
                smp_new_seg(sc);
                if (sc->cur_seg != NULL)
                        left = smp_spaceleft(sc, sc->cur_seg);
>>>             else
>>>                     left = 0;
        }

In both the next two cases, you should consider truncation
of the new segment, it's silly to waste a full segment if
the overflow is just one page:

+       if (smp_segend(&tmpsg) > sc->mediasize)
+               tmpsg.p.offset = sc->ident->stuff[SMP_SPC_STUFF];


+       sg = VTAILQ_FIRST(&sc->segments);
+       if (sg != NULL && sg->p.offset >= tmpsg.p.offset) {
+               if (smp_segend(&tmpsg) > sg->p.offset)
+                       return;
+               assert(smp_segend(&tmpsg) <= sg->p.offset);
        }

(The last if might be clearer if you swapped the direction
of the compare to tmpsg <= sg)

The return should be commented as failure

+       ALLOC_OBJ(sg, SMP_SEG_MAGIC);
+       AN(sg);

Instead of AN, you should just return failure.  We should handle
malloc failures gracefully where we can easily do so.

        sg->p.offset = IRNUP(sc, sg->p.offset);
        sg->p.length = IRNDN(sc, sg->p.length);

Isn't there a risk that the offset is rounded up, but the length
not rounded down, so we overstep by one page ?

I don't think it can happen, but an assert to make sure is probably
a good idea.


----------------------
Subject: [PATCH 06/21] smp_thread() stopping

        printf("Silo completely loaded\n");

All things considered, we should probably VSL these and live with
it.   SLT_PersistentEvent or something.

+               if (sg != NULL && sg != sc->cur_seg && sg->nobj == 0) {
                        smp_save_segs(sc);
                }

Loose {}

+               VTIM_sleep(1);

We should avoid whole-second sleeps, they have a bad tendency to
get things to run in lockstep.  Ideally each silo should sleep
for 1+random-small-delta, but at least make it different from 1.0

+       while (1)
+               VTIM_sleep(1);
+
        NEEDLESS_RETURN(NULL);

You should simply let the tread die, and pthread_join() it from
smp_close().

In either case, it is silly that we close the silos serially, you
should add a "signal_close()" method to stevedore, and call
any which is non-NULL as the first thing in STV_close(), so that
we can sed the SC_STOP on all silos before we wait for the first
one to do so.


----------------------
Subject: [PATCH 11/21] Generalize the ban reporting to the stevedores

I appreciate the generality of STV_NewBan(), but what is the point
of calling BAN_Spec, rather than passing the ban+len as arguments ?

----------------------
Subject: [PATCH 12/21] Also report dropped bans to the stevedores.

Looks ok.

----------------------
Subject: [PATCH 14/21] Add consistency checks between the ban lists at startup

I wonder if it would make more sense to not write here, and instead
wait until everything is loaded and then write new (and possibly shorter)
ban lists to both segments ?  (Ref: recent talk about pruning bans)

----------------------
Subject: [PATCH 15/21] Add the concept of an extra area to the beginning of smp signs.
Subject: [PATCH 16/21] Store the ban timestamp in the smp ban lists
Subject: [PATCH 17/21] Keep an offset on the ban lists (in an extra signed

I'm not quite sure how this should be done, but I'm sure its not this way.

There is really only one bit of info you really need to record, and
that is the timestamp of the oldest non-removed ban, and as far
as I can see, you only need to store it on silo close, and the right
way to store it, might be to rewrite the live parts of the ban-lists
at that time.

There is certainly no need to record the bantimes in the segment
(0016), since they are already there in the ban-spec.


non-mentioned patches are not reviewed.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.



More information about the varnish-dev mailing list