Best practice for caching scenario with different backend servers but same content

Mon Aug 9 12:49:25 UTC 2021

Hello,

The best way to answer these questions is to start with the last one:

On 8/9/21 10:50, Hamidreza Hosseini wrote:
> 
> 6. conceptual question:
> 
> 1.What's the exact difference between hash and shard directive and when
> should we use which one?
> the Doc says when the backend changes shard is more consistent than hash
> but how?

It's not "more consistent", it's "consistent hashing", which is the name
of a hashing algorithm intended for load balancing:

https://en.wikipedia.org/wiki/Consistent_hashing

The hash director uses a more traditional kind of hashing algorithm to
map requests to backends. Consistent hashing, implemented by the shard
director, is intended to mitigate problems that can arise when backends
become unhealthy and then healthy again. The focus is mainly on backends
that do something expensive for new requests, such as their own caching,
but can work faster for requests that they've seen before.

A hash algorithm computes a number h(b) for a backend b, whose value is
in a large range, say 32 bit. Then if you have N backends, the
traditional algorithm indexes them from 0 to N-1, and picks h(b) mod N.
Varnish's hash director is something like that.

Say you have N=10 backends, so the traditional algorithm picks h(b) mod
10. Then a backend goes unhealthy, so now N=9. Since x mod 10 is unequal
to x mod 9 for all x, the mapping of requests to backends shifts
completely. This can be painful for backends that benefit from getting
mostly the same requests, for example due to caching.

After a while, the unhealthy backend becomes healthy again, so we go
from N=9 back to N=10. If in the meantime the backends had "gotten used
to" the changed distribution of requests, say by filling their local
caches, then they get the pain all over again.

Consistent hashing attempts to lessen the pain. If a backend drops out,
then the mapping of requests to that backend must change, but the
mapping stays the same for all other backends. So the distribution to
backends changes only as much as it has to.

Disclaimer: the original version of the shard director was developed at
my company. Some lines of code that I contributed are still in there.

> 2.What will it happen when I'm using shard director based on
> "key=bereq.url" if I add/delete one backend from backend lists? will it
> change the consistent hashing ring for requests?

That's what consistent hashing is about. It only changes for the
backends that were added or deleted, not for the rest.

> 1. First of all I think following section is not true, Do we need define
> both shard parameter(p) and replicas in the reconfigure directive?
> "hls_cluster.reconfigure(p, replicas=25);" or just
> "hls_cluster.reconfigure(replicas=25);"

There is no 2-argument form of reconfigure(). It has one optional
argument (replicas) with a default value.

> 2. What does "replicas=25" means in the sample configuration?
> 
> why is this neccessary?

The short answer is that if you have to ask, then use the default. Since
the argument is optional, and set to the default if you leave it out,
just leave it out: reconfigure()

You should NOT set it to 25.

replicas is an internal parameter of the algorithm, and there aren't
many guidelines as how it should be set, so we wanted to be able to
experiment with it. It's really there so that developers of Varnish and
the director can test it. Most users of Varnish don't need to worry
about it.

It turns out that the value doesn't matter all that much, as long as it
isn't too low. 25 is too low. I advise against setting it lower than the
default (67).

What replicas does and why it's necessary gets into details of the
algorithm. We can say more about that if there's interest, but this
email is getting pretty long as it is. (The Wikipedia article gets into
this in the section "Practical Extensions".)

> 3. In the shard.backend(...) section and About "resolve=LAZY":
> I couldn't understand what does LAZY resolve mean?

When you set bereq.backend to a director, most of the Varnish directors
do not immediately execute the algorithm to choose a backend, not until
it's time to actually send the backend request. This has some
advantages, for example if the VCL logic after setting bereq.backend
results in not going to that backend after all. resolve=LAZY works this way.

The alternative resolve=NOW is for contexts where you return the backend
(read the return value of shard.backend()) and need to know which
backend it's going to be. Then the backend is chosen right away, and
stays that way when the backend request is sent.

> 4. For returning healthy backend besides defining probes as I adjust,
> should I configure healthy=ALL as follow?

The parameters alt, healthy, rampup and warmup give you some control
over what happens when one or more backends drop out.

Say your request ordinarily maps to b1, but b1 is unhealthy; then there
is a specific backend b2 that is chosen next. If b2 is also unhealthy,
then there is a specific alternative b3, and so on. These are always the
same. In other words, the order of alternatives chosen for unhealthy
backends is constant for a given configuration.

If you set alt=N with N > 0, then the Nth backend in that order is
chosen. This is mainly for retries -- by setting alt=bereq.retries, you
try a different backend on each retry, in the order established by b1,
b2, b3 ... etc.

The healthy parameter controls what happens when the director searches
down the list due to unhealthy backends. healthy=ALL means that it
continues searching, starting at alt=N (from the start when alt=0),
until a healthy backend is found (or fail if they're all unhealthy).

healthy=CHOSEN means don't skip ahead due to alt, just search for a
healthy backend starting from the beginning of the list.

healthy=IGNORE means don't consider the health status, and just choose
the backend at alt=N no matter what.

> 5. About rampup and warmup :
> rampup: I understand that if a backend goes down and become healthy
> again if we defined a rampup period for it, it would wait till this
> period passed and then varnish will send the request to that backend for
> this fraction of time it will return alternative backend

Not quite. When a backend is added and rampup is set, then the
probability of choosing that backend increases during the time set by
rampup. Say for rampup=60s, the probability is very low just after the
backend goes healthy, 25% after 15 seconds, 50% after 30 seconds, and so on.

The idea is for backends that do expensive operations such as caching
for "new" requests. Such a backend could be overwhelmed if Varnish sends
all of the new requests at once when the backend becomes healthy, so
rampup increases the load slowly.

> warmup: for a choosen backend for specific key it will spread request
> between two backend (the original backend and its alternative if we
> define 0.5 for warmup)

Yes. warmup is the probability that the next backend down the line is
chosen, even if the first backend that would have been chosen is healthy.

This is also for backends that would suffer under heavy load if another
backend goes unhealthy, due to new requests that would have gone to the
unhealthy backend. With warmup, the backend gets a portion of those
requests even when the other backend is healthy, so that it's partially
prepared for the additional load if the other backend drops out.

HTH,
Geoff
-- 
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstraße 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20210809/77ac5a73/attachment.bin>