Best practice for caching scenario with different backend servers but same content

Hamidreza Hosseini hrhosseini at hotmail.com
Mon Aug 16 13:34:26 UTC 2021


> In that case, hashing the URL only would prevent you from adding new
domains through your Varnish server. It won't hurt if you know you
will only ever have one domain to deal with, but hashing the host will
also not hurt as long as you normalize it to a unique value.

Hi,
Let me elaborate my architecture more:
I have some backend servers to serve hls fragments for video live stream,e.g:

```

hls_backend_01
hls_backend_02
hls_backend_03
hls_backend_04
hls_backend_05
hls_backend_06
hls_backend_07
hls_backend_08
hls_backend_09
hls_backend_10

```

There is same content on all hls backend servers, there are 5 varnish in front of them for caching
Now If I use round-robin director on Varnishes, because varnish would cache " req.http.host + req.url ", so for the same content but from different backends it would cache double! for example:
if varnish for the first request and "test.ts" file goes to "hls_backend_01"  backend server, would cache it and
for the next request from other clients because it is using round-robin director
it goes to "hls_backend_02" and would cache the same file again due to different "req.http.host"
So now I have a solution to use Shard director based on "key=req.url" instead of round robin
another way is to use round robin but adjusting the hash vcl to something like bellow:

```

sub vcl_hash {
    hash_data(req.url);
    return (lookup);
}

```

In this way varnish just hash the "req.url" not "req.http.host"
So, Varnish would cache the content based on the content uniqueness not based on the difference between backends.
1. At first, I asked how I can normalize it, Is it possible at all according to what I said!?
Would you please explain it more with an example?

2. You give an example about other domains, In this case I do not understand what it has to do with the domain?

3.Maybe I'm thinking in wrong way because if varnish hash the data based on req.url : 'hash_data(req.url)' It shouldn't cache the same content but different backends again!
for example my request is :

http://varnish-01:/hls/test.ts
for first request it goes to "hls_backend_01" backend and cache it and for next request it goes to "hls_backend_02" backend,
so for each request it caches it again because backends are different?

Many Thanks, Hamidreza

________________________________
From: varnish-misc <varnish-misc-bounces+hrhosseini=hotmail.com at varnish-cache.org> on behalf of Dridi Boukelmoune <dridi at varni.sh>
Sent: Sunday, August 15, 2021 10:30 PM
To: varnish-misc at varnish-cache.org <varnish-misc at varnish-cache.org>
Subject: Re: Best practice for caching scenario with different backend servers but same content

On Sat, Aug 14, 2021 at 10:54 AM Hamidreza Hosseini
<hrhosseini at hotmail.com> wrote:
>
> Hi,
> Thanks to you and all varnish team for such answers that helped me alot,
> I read the default varnish cache configuration again:
> https://github.com/varnishcache/varnish-cache/blob/6.0/bin/varnishd/builtin.vcl
> and find out vcl_hash as follow:
>
> ```
> sub vcl_hash {
> hash_data(req.url);
> if (req.http.host) {
> hash_data(req.http.host);
> } else {
> hash_data(server.ip);
> }
> return (lookup);
> }
>
> ```
> So, if I change vcl_hash like following , would it be enough for my purpose?(I mean caching the same object from different backends just once with roundrobin directive !:)
>
> ```
>
> sub vcl_hash {
>     hash_data(req.url);
>     return (lookup);
> }
>
> ```
>
> By this config I told varnish just cache the content based on the 'req.url' not 'req.http.host' therefore with the same content but different backend varnish would cache once(If I want to use round robin directive instead of shard directive ), Is this true? what bad consequences may it cause in the future by this configuration?

In this case req.http.host usually refers to the the domain end users
resolve to find your varnish server (or other hops in front of it). It
is usually the same for every client, let's take www.myapp.com<http://www.myapp.com> as an
example. If your varnish server is in front of multiple services, you
should be handling the different host headers explicitly. For exampe
if you have exactly two domains you should normalize them to some
canonical form. Using the same example domain that could be
www.myapp.com<http://www.myapp.com> and static.myapp.com for instance.

In that case hashing the URL only would prevent you from adding new
domains through your Varnish server. It won't hurt if you know you
will only ever have one domain to deal with, but hashing the host will
also not hurt as long as you normalize it to a unique value.

You are correct that by default hashing the request appropriately will
help the shard director do the right thing out of the box. I remember
however that you only wanted to hash a subset of the URL for video
segments, so hashing the URL as-is won't provide the behavior you are
looking for.

Dridi
_______________________________________________
varnish-misc mailing list
varnish-misc at varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20210816/a3f97124/attachment.html>


More information about the varnish-misc mailing list