Varnish constantly running into an OOM condition.

Chris Lee chris.lee at cern.ch
Thu Jul 10 08:25:01 UTC 2025


Hi Guillaume

Thanks for the response. Our Developers were worried that nobody would reply since we are using V6 :-)

The transient should be at 1G (-s malloc,2G -s Transient=malloc,1G)
Below are all the SMA values [1], but at the moment the real value is much higher [2]

From https://varnish-cache.org/docs/trunk/users-guide/storage-backends.html my understanding is that transient is used when the TTL is below the short-lived setting.
The default_ttl and  shortlived are the default settings, so 120s and 10s.
Checking the headers in the varnish log, all of these are coming in as “Cache-Control: max-age=3000”,
So I was actually thinking of setting the transient much lower, since as you can see in [3] on a new server I started 3 days ago, the transient isn’t even being used at all.

As I mentioned, I could try V7, But with my sysadmin hat on, I really wanted to try find out what is going in the stable releases, before jumping to the latest release.

Thanks in Advance
Chris


[1]:

[atlasfrontiergpn02 ~]# varnishstat -1 -f 'SMA*'

SMA.s0.c_req           4797047       174.74 Allocator requests

SMA.s0.c_fail            24725         0.90 Allocator failures

SMA.s0.c_bytes     73606101515   2681265.54 Bytes allocated

SMA.s0.c_freed     72001644367   2622819.63 Bytes freed

SMA.s0.g_alloc          113811          .   Allocations outstanding

SMA.s0.g_bytes      1604457148          .   Bytes outstanding

SMA.s0.g_space       543026500          .   Bytes available

SMA.Transient.c_req            0         0.00 Allocator requests

SMA.Transient.c_fail            0         0.00 Allocator failures

SMA.Transient.c_bytes            0         0.00 Bytes allocated

SMA.Transient.c_freed            0         0.00 Bytes freed

SMA.Transient.g_alloc            0          .   Allocations outstanding

SMA.Transient.g_bytes            0          .   Bytes outstanding

SMA.Transient.g_space   1073741824          .   Bytes available

[2]:

[atlasfrontiergpn02 ~]# top -b -n 1 -p $(pgrep -d, -f varnishd) |egrep "PID|$(pgrep -d"|" -f varnishd)"

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

2685175 varnish   20   0   14.1g  12.5g  86272 S   6.7  88.8  30:40.92 cache-main

2685154 varnish   20   0   23748   6148   5376 S   0.0   0.0   0:01.20 varnishd

[3]:
[PastedGraphic-1.png]

On 10 Jul 2025, at 07:17, Guillaume Quintard <guillaume.quintard at gmail.com> wrote:

Hi Chris,

What are the g_bytes counters in varnishstat saying? If you haven't bounded your Transient storage. It could be a reason.

The other suspect is the newer jemalloc version in the repository. If the problem isn't the transient storage, I would encourage you to try the packagecloud repository to get a newer version and see if this help.

--
Guillaume Quintard

On Tue, Jul 8, 2025, 04:19 Chris Lee <chris.lee at cern.ch<mailto:chris.lee at cern.ch>> wrote:
Hi all,

I am trying to install a varnish service for some of our developers, and I’ll admit I know next to nothing about varnish itself.

While things are up and running, Varnish was being killed every 2-3 hours by OOM-Killer as the cache-main process is using up all the system memory [1].
We are using varnish-6.6.2-6.el9_6.1.x86_64 on a VM with 4 cores, 14Gi of RAM running on AlmaLinux release 9.6
We could try to install V7, but would prefer to stay with the releases available from the default repositories which are mirrored locally.

I have tried to change the malloc memory settings and gone down in 2G increments from 10G to 2G where it is now.
This has increased the number of evictions, but extended the uptime a bit.
Adjusting the workspace_client and workspace_backend settings has increased the OOM interval to about 6-8 hours.

The service is currently run via systemd as per the command line in [2].

The hit rats as shown in [3] are fairly high from what I can tell, and the default.vlc is shown in [4]

In the mailing list archives I found a link pointing to https://info.varnish-software.com/blog/understanding-varnish-cache-memory-usage and I haven’t tried to tune the malloc settings mentioned in there yet.

But I’m running out of idea’s and though I would ask the experts here first for some guidance and assistance.

Thanks in Advance
Chris

[1]:
```
[root at frontier-varnish02 ~]# top -p $(pgrep -d, -f varnishd) |egrep "PID|$(pgrep -d"|" -f varnishd)"
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
740517 varnish   20   0   12.6g  11.1g  86528 S  20.0  78.8  27:42.25 cache-main
740496 varnish   20   0   23748   6404   5632 S   0.0   0.0   0:00.52 varnishd
```
[2]:
```
/usr/sbin/varnishd -f /etc/varnish/default.vcl -a http=:6082,HTTP -a proxy=:8443,PROXY -p feature=+http2 -p max_restarts=8 -p workspace_client=512k -p workspace_backend=512k -s malloc,2G -s Transient=malloc,1G
```
[3]:
```
Uptime mgt:  0+03:23:57                                             Hitrate n:  10  100   171
Uptime child:    0+03:23:58                                          avg(n):   0.9977   0.9940   0.9931
Press <h> to toggle help screen
    NAME                       CURRENT        CHANGE       AVERAGE        AVG_10       AVG_100      AVG_1000
MGT.uptime                   0+03:23:57
MAIN.uptime                 0+03:23:58
MAIN.sess_conn               2308418        123.91        188.63        139.32        133.20        132.53
MAIN.client_req               21919893       2019.56    1791.13       1356.64       1257.93       1228.81
MAIN.cache_hit               21866126       2016.56    1786.74       1354.38       1256.70       1227.64
MAIN.cache_miss                 47442          3.00          3.88             2.26          1.22 1.17
```

[4]:
```
vcl 4.1;
import std;
import directors;

backend frontier_1 {
  .host = "atlasfrontier1-ai.cern.ch<http://atlasfrontier1-ai.cern.ch/>";
  .port = "8000";
}
backend frontier_2 {
  .host = "atlasfrontier2-ai.cern.ch<http://atlasfrontier2-ai.cern.ch/>";
  .port = "8000";
}
backend frontier_3 {
  .host = "atlasfrontier3-ai.cern.ch<http://atlasfrontier3-ai.cern.ch/>";
  .port = "8000";
}
backend frontier_4 {
  .host = "atlasfrontier4-ai.cern.ch<http://atlasfrontier4-ai.cern.ch/>";
  .port = "8000";
}

sub vcl_init {
  new vdir = directors.round_robin();
  vdir.add_backend(frontier_1);
  vdir.add_backend(frontier_2);
  vdir.add_backend(frontier_3);
  vdir.add_backend(frontier_4);
}

sub vcl_recv {
  set req.backend_hint = vdir.backend();           set req.http.X-frontier-id = "varnish";
  if (req.method != "GET" && req.method != "HEAD") {
    return (pipe);
  }
}
```

_______________________________________________
varnish-misc mailing list
varnish-misc at varnish-cache.org<mailto:varnish-misc at varnish-cache.org>
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20250710/76195661/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 117470 bytes
Desc: PastedGraphic-1.png
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-misc/attachments/20250710/76195661/attachment-0001.png>


More information about the varnish-misc mailing list