Best practice for caching scenario with different backend servers but same content

Wed Oct 6 06:12:09 UTC 2021

On Mon, Aug 16, 2021 at 1:34 PM Hamidreza Hosseini
<hrhosseini at hotmail.com> wrote:
>
> > In that case, hashing the URL only would prevent you from adding new
> domains through your Varnish server. It won't hurt if you know you
> will only ever have one domain to deal with, but hashing the host will
> also not hurt as long as you normalize it to a unique value.
>
> Hi,
> Let me elaborate my architecture more:
> I have some backend servers to serve hls fragments for video live stream,e.g:
>
> ```
>
> hls_backend_01
> hls_backend_02
> hls_backend_03
> hls_backend_04
> hls_backend_05
> hls_backend_06
> hls_backend_07
> hls_backend_08
> hls_backend_09
> hls_backend_10
>
> ```
>
> There is same content on all hls backend servers, there are 5 varnish in front of them for caching
> Now If I use round-robin director on Varnishes, because varnish would cache " req.http.host + req.url ", so for the same content but from different backends it would cache double! for example:
> if varnish for the first request and "test.ts" file goes to "hls_backend_01"  backend server, would cache it and
> for the next request from other clients because it is using round-robin director
> it goes to "hls_backend_02" and would cache the same file again due to different "req.http.host"
> So now I have a solution to use Shard director based on "key=req.url" instead of round robin
> another way is to use round robin but adjusting the hash vcl to something like bellow:
>
> ```
>
> sub vcl_hash {
>     hash_data(req.url);
>     return (lookup);
> }
>
> ```
>
> In this way varnish just hash the "req.url" not "req.http.host"
> So, Varnish would cache the content based on the content uniqueness not based on the difference between backends.
> 1. At first, I asked how I can normalize it, Is it possible at all according to what I said!?
> Would you please explain it more with an example?

In this case I think you are confusing "req.http.host" (host header)
with the backend host name.

For example, if you reach one of your 5 Varnish servers via
www.example.com that's what clients will use and that's what
req.http.host will contain.

Your backends FQDNs could be something like this:

- hls01.internal.example.com
- hls02.internal.example.com
- hls03.internal.example.com
- ...
- hls10.internal.example.com

As the example suggests, these domains should not be directly reached
by clients if your goal is to proxy them with Varnish. Those internal
FQDNs should have no effect on the cache key populated with
hash_data(...).

> 2. You give an example about other domains, In this case I do not understand what it has to do with the domain

Let's say your clients can reach either example.com or www.example.com
for the same service, or tomorrow you add more than your HLS service
behind Varnish you may very well receive multiple host headers.

> 3.Maybe I'm thinking in wrong way because if varnish hash the data based on req.url : 'hash_data(req.url)' It shouldn't cache the same content but different backends again!
> for example my request is :

In this case you are "hashing" the client request with hash_data(...)
and it has nothing to do with backend selection. The fallback director
will precisely not do any kind of traffic balancing since its purpose
is to always select the first healthy backend in the insertion order.
The shard director may rely on the request hash or other criteria as
we already covered.

> http://varnish-01:/hls/test.ts
> for first request it goes to "hls_backend_01" backend and cache it and for next request it goes to "hls_backend_02" backend,
> so for each request it caches it again because backends are different?

All subsequent requests to http://varnish-01:/hls/test.ts should go to
the same hls_backend_01 backend with the shard director. As long as
there are no other criteria than the ones we already discussed. If you
want consistency across all your Varnish servers, you should configure
your shard directors identically, with the backends added in the same
order (unlike your initial VCL example using the fallback director).

Best,
Dridi