Caching a low quantity of huge files with Varnish

Thiago Moraes thiago at cmoraes.com
Sat Oct 15 20:50:22 CEST 2011


Hey Kristian,

Thanks for your answer, it was really clarifying.

2011/10/14 Kristian Lyngstol <kristian at varnish-software.com>

> Greetings,
>
> On Thu, Oct 13, 2011 at 02:20:48PM -0300, Thiago Moraes wrote:
> > I have one server that provides access to some hundreds of files. All
> > of them are really big (some have more than 10GB). These files won't
> > change and are read only, but I'll need to provide access to them via
> > a WAN.
> >
> > I want to make the access faster by using a reverse proxy server
> > running near my users. For example:
> >
> > -User X wants to access something on server A.
> > -User X access a reverse proxy server on his LAN which causes a cache
> > miss. The file is downloaded to this proxy server.
> > -The next time user X wants the same file, he doesn't need to go get
> > it in my main server.
>
> My first thought is why you need a reverse proxy and not a regular one.
>

I'm not really experienced in this kind of work so the choice was made based
on what I read about cache servers in the past weeks. As the server would
only cache data from my application, it appeared to me that the reverse
proxy alternative would be less "intrusive". I don't have the power to
change the network topology from my users so I don't want to redirect all
their traffic through my proxy. Probably, there already exists a proxy
server in their networks. Am I missing something with this approach? Is it
necessary?


>
> > I read something about Squid having problems caching files larger than
> > 2GB. Does Varnish face the same kind of problem?
>
> Short answer: No.
>
> From Varnish 3, we do regular tests to see that files bigger than 4GB
> are dealt with sensibly. 2/4GB being the barrier, and beyond that, the
> logical limit is your memory manager/hardware/network/client, etc. 10GB
> /shouldn't/ be a big problem.
>
> But keep in mind:
>
> - You need a 64bit system.
> - Use the latest Varnish version. 3.0.1 or even 3.0.2 when that
>  finishes. 2.1 is not recommended.
>

As the project is new I'll use the current stable version, probably. Are
there any advantages for me to use 3.0.2 right now? How's the stability of
the developer's version? Would you recommendo to start already using it?


> - If your users will access these files concurrently, you will really
>  want to try out Martin's streaming branch, as that will ensure that
>  multiple clients get the content as it arrives at the cache. Without
>  Martin's branch, the first user that gets a cache miss will get the
>  content "streamed" while Varnish downloads it, while other users have
>  to wait for the download to finish before they get any data.

- If your data is compressible, make sure you use gzip. Even if your
>  clients don't support it (e.g: scripts?), Varnish does. If it isn't
>  compressible, consider disabling it.
>

I don't really know how to answer that, but I'll ask you something about
this anyway=p.
My application is a server with a REST interface to a bunch of scientific
data files. The interface allows you to slice the data and to get only a
subset of it transparently. As the server was developed by someone else, I
don't know how it handles data compression, but as it is possible to access
data through some really ugly applications, I believe it's not possible to
use compression due to the clients.

How would varnish compression work? It would send compressed data to the
cache server and them uncompress it before sending to the clients?


> - Make sure the cache is big enough, or that evictions are
>  well-controlled. If you're mixing small and large files, you might end
>  up evicting hundreds of small files to fit room for a single large
>  one.
>

I'll have mostly large files. There are some little ones with just a few MB,
but they're the exception in my case.


>
> With all of that said, I'd like to emphasis that support for large
> files is relatively new. I expect there's room for optimizations.
>
> The less used features are also the less optimized features. As a
> consequence of the lack of feedback.
>
> I would greatly appreciate any feedback you have if/when you test this.
>
> - Kristian
>
>
I'll provide some feedback when I get it. I'm learning how to configure
varnish right now and figuring out some way of testing if my requisites are
fulfilled. By the way, would you recommend some way to test the server? Are
there any automated tool to do this kind of test?

Thank you very much,

Thiago Moraes - EnC07 - UFSCar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20111015/654ae27e/attachment-0003.html>


More information about the varnish-misc mailing list