Serving and updating files with VMOD file

Generating responses in VCL

We don’t think of Varnish as an HTTP or file server. It’s a caching proxy for HTTP, and it should stay that way. Serving content is better left to the origin servers, from which Varnish fetches and caches responses.

But in VCL we do have the means to synthesize responses, in which case Varnish itself generates a response. That’s the purpose of the VCL subroutine vcl_synth, which generates a response body that is handed off to client delivery. vcl_synth is used most often, by far, to generate error responses, for errors determined at the VCL level, when there is no response from an origin server that could be used. This could be the result of a failed backend fetch with no available cached response, or an invalid client request, or a logic error in VCL, and a myriad of other possibilities.

Error responses are not the only kind of responses that Varnish can synthesize. In fact, there are some (rather exotic) projects for which tasks are executed entirely in VCL, and vcl_synth is used to generate responses that share the results. One might say that Varnish is functioning as an HTTP server after all; but we won’t explore that further here.

Returning an error response is quite easy for the VCL developer – just exit the current flow of execution with return(synth(503)) or return(synth(400)) for a response with status 503 or 400. The response status is set to the value in synth(), a client response is delivered, and the VCL state machine is completed.

There’s a bit of a problem, however, when the standard synthetic response is generated by Varnish’s implementation of vcl_synth in builtin.vcl. The end user sees something like this:

../_images/guru-meditation.png

That’s it, that’s the response. Both users and project managers have been known to be alarmed to see a site looking like that.

Moreover, a look at the code for builtin vcl_synth reveals that maintaing HTML output, even for something as simple as the Guru Meditation, is not an especially elegant task for VCL:

sub vcl_builtin_synth {
      set resp.http.Content-Type = "text/html; charset=utf-8";
      set resp.http.Retry-After = "5";
      set resp.body = {"<!DOCTYPE html>
<html>
  <head>
    <title>"} + resp.status + " " + resp.reason + {"</title>
  </head>
  <body>
    <h1>Error "} + resp.status + " " + resp.reason + {"</h1>
    <p>"} + resp.reason + {"</p>
    <h3>Guru Meditation:</h3>
    <p>XID: "} + req.xid + {"</p>
    <hr>
    <p>Varnish cache server</p>
  </body>
</html>
"};
}

The response body is set by assigning a string to resp.body, which must contain its entire contents. The string here is represented with special {" and "} delimiters, which may contain any content (including quotation marks) up to the closing delimiter. resp.body is relatively new in recent versions of VCL; in older versions, the only way to generate a response was to use the synthetic() function, with the response body as its string argument. (synthetic() is still supported.)

Most sites much prefer to have a custom error response, more in line with the sites’s design, and nicer to look at than the Guru Meditation. Origin servers have no problem with that, but errors encountered in VCL have to be generated in vcl_synth. Better-looking HTML invariably means more HTML, making the maintenance task even worse if it is embedded in VCL.

For these reasons, it is very convenient to generate responses whose content is stored in a file. A “nice-looking” HTML file can be provided by team members who produce the site’s HTML already, and VCL authors wouldn’t have to maintain it, a situation that usually makes both teams much happier.

Reading file contents is not strictly supported by the VCL language, but it is made possible by VMODs, among them the std VMOD included in the standard Varnish distribution, and the third-party VMOD file. VMOD file will be the focus of this tutorial, but first let’s look at how it’s done in VMOD std.

File reads with the “standard” VMOD

Reading file contents with VMOD std is simple enough with the std.fileread() function:

sub vcl_synth {
      # ...
      set resp.body = std.fileread("/path/to/file");
}

That’s all it takes to synthesize a response. std.fileread() opens and reads the file on its first-ever invocation, and if successful, it caches the file’s contents and returns them as a string on every subsequent invocation. That means that the file is only ever read once.

That might seem like the perfect solution, and for many purposes it suffices. It means, however, that no changes in the file will ever be reflected in the returned string. This is true even when VCL is reloaded. The only way to change the contents returned by std.fileread() is to restart the Varnish child process.

This seems very inconvenient, but there are good reasons for it. Varnish is designed so that, if possible, it is never the cause of long latency or poor scalability of HTTP requests and responses. That could be caused by file I/O, so std.fileread() does the absolute minimum file system access that is possible.

For a use case such as custom error pages, it may be true that updates are not necessary very often. Nevertheless, std.fileread() does not make it possible to update contents read from a file at all, without the drastic measure of restarting Varnish.

File reads and updates with VMOD file

VMOD file implements file reads with its reader object, which reads the contents of a file, and periodically chacks it for updates in a background thread:

import file;

sub vcl_init {
      # Check the file for updates once an hour.
      new rdr = file.reader("/path/to/file", ttl=1h);
}

sub vcl_synth {
      # ...
      set resp.body = rdr.get();
}

The .get() method can be used to retrieve the file’s contents as a string in any context. As an alternative for vcl_synth, the .synth() method may used:

sub vcl_synth {
      # ...
      rdr.synth();
}

The .synth() method does not return any value. It generates the synthetic response body, and hence can only be used in vcl_synth.

The ttl parameter in the constructor, for “time to live” (re-applying the familiar concept from Varnish’s cache), defines an interval at which the file is checked for changes. This is one of the ways VMOD file reduces the overhead of file I/O. For the duration of TTL interval, methods such as .get() and .synth() obtain the file contents when it was last read (similar to std.fileread()’s solution of caching file contents). Checks for changes, and potential new file reads, only take place at an interval chosen by the VCL author.

The impact of file I/O is also excluded from request/response transactions entirely, because file reads and update checks are performed in a separate thread. Access methods such as .get() and .synth() get the file’s contents from its most recent read, but never invoke any file I/O while Varnish is processing a request.

Because of the fact that checks for changes and file reads are run in the background, any logging that is emitted while doing so, particularly error logs, does not appear in the default output of varnishlog. The default only shows logs for request/response transactions, but logs for the activity of the background reader only appear in varnishlog’s “raw” grouping. The VMOD’s manual page explains in more detail how to read the VMOD’s log output.

It is possible, however, to determine in VCL if any errors occurred at the last file check, using the .error() and .errmsg() methods:

import std;

if (rdr.error()) {
      std.log("file reader error: " + rdr.errmsg());
}

The .error() method returns true if there was an error at the last check, and .errmsg() returns the error message. If the problem is fixed by the next check, then afterward .error() returns false, and .errmsg() returns a string indicating that there was no error.

It is possible to invoke a synchronous check of the file for updates, for example when a new version of the file is available and you really need to get the new contents right away. The .check() method does the same work that the check does in the background does when the TTL elapses. Except it does so from VCL code, and hence during request processing.

Obviously it’s important to use .check() carefully; running it for every request may have an adverse effect on Varnish’s performance. The VMOD’s manual shows an example of calling .check() in an “admin-only” request with restricted access; this might be part of an automation that updates the file and sends a request to bring about timely updates:

import file;

if (req.url == "/update-files") {
    # Assume that this ACL defines internal admin networks.
    # Return error status 403 Forbidden if the client IP doesn't
    # match.
    if (client.ip !~ admin_network) {
            return (synth(403));
    }
    # Internal admins may run a synchronous file check.
    rdr.check();
}

File deletion and update

VMOD file was designed to minimize the impact of file I/O on Varnish’s performance. As seen in the previous example, it is also designed with the assumption that automated processes common at most production sites are deployed to maintain the file. As we will see, both of these goals play a role in the way the VMOD handles file deletion and updates.

The VMOD’s method of “caching” file contents is driven by the performance goal – the content cache is a memory-mapping of the file (see mmap(2)). Memory mappings of files are an efficient and low resource solution on many platforms; but the rules concerning memory mappings have some nuances worth considering.

One of these is that a file is not unmapped when it is deleted from the file system. After deletion, a file is no longer accessible by name, but the mapping persists until it is explicitly unmapped. Because of this, if a file has been deleted when the VMOD performs its next background check, it is not an error. The file’s contents are still mapped and available to access methods such as .get() and .synth().

It is possible, however, to determine in VCL that this has happened, by using the .deleted() method:

import std;

if (rdr.deleted()) {
      std.log("File has been deleted, cached contents still valid");
}

The rules concerning mmap(2) also affect the guarantees that the VMOD can make about what happens when the file is changed. There are subtle differences on different platforms, and some intuitively “obvious” ways to change a file, such as overwriting it or editing it in place, don’t necessarily have the expected effect in every environment. This subject is discussed in more detail in the VMOD’s manual.

A “portable” guarantee that file updates will work as expected can be made if they are done in two steps:

  • Delete the file.

  • Write the new contents to a new file of the same name (same path location).

Of course an automated update process can carry out these two steps. But it is important to remember to do so if you want to make ad hoc changes, otherwise the results may be unexpected.

Conditional requests for a file

The VMOD’s file object provides methods with information about the file. In this section we will see how some of these can be used to support conditional requests – potential validation with response status 304 Not Modified.

The examples thus far have focused on error responses, which in most cases are not candidates for caching, conditional requests or the 304 response. The following is likely to be useful only if you are using the VMOD for other purposes. If so, then the VMOD supports some functions that are commonly expected for static files.

The reader object has a method .mtime(), which returns its most recent modification time when it was last checked, as a VCL TIME. The method can be used to implement the semantics of the Last-Modified and If-Modified-Since headers:

import std;

sub vcl_recv {
      # Return status 304 if If-Modified-Since is earlier than
      # the modification time returned by .mtime().

      # std.time() parses If-Modified-Since as a TIME. If the parse
      # fails, fall back to now (which is almost certainly not earlier
      # than the mtime).
      if (std.time(req.http.If-Modified-Since, now) < rdr.mtime()) {
              return (synth(304));
      }
      else {
              return (synth(200));
      }
}

sub vcl_synth {
      # Always set Last-Modified to the value returned from .mtime().
      # Converting a TIME to a string results in an HTTP date,
      # which is appropriate for Last-Modified.
      set resp.http.Last-Modified = rdr.mtime();

      # Set the response body to the file contents if the status has
      # been set to 200. For status 304, we generate no body.
      if (resp.status == 200) {
              rdr.synth();
      }
      return(deliver);
}

The VMOD also provides two methods that return digests for the file underlying a file object – .id() and .sha256(). The .id() method is available for every object; its contents are derived from the file’s metadata, and is inexpensive to compute. Because of the potential cost of computing a SHA256 digest, the .sha256() method is available only if the enable_sha256 parameter is set to true in the object constructor:

import file;

sub vcl_init {
      new rdr = file.reader("/path/to/file", enable_sha256=true);
}

SHA256 may be considerably more costly to compute for large files than the digest returned by .id() – see the manual for further discussion about the differences between the two digest methods. Since .id() is entirely suitable for the next example, we will only consider it here.

The values returned by the two digest methods are sutiable for use in the ETag response header, and can be compared with a request’s If-None-Match header. The code is very similar to the implementation of Last-Modified and If-Modified-Since above. The main difference is that the digest methods return the BLOB type, so they need to converted to strings using binary-to-text encodings from the blob VMOD in the standard distribution:

# Use the base64 encoding of .id() for the ETag response header, and
# send a 304 response if the If-None-Match request header matches
# it.

import blob;

sub vcl_recv {
      if (req.http.If-None-Match == blob.encode(BASE64, blob=rdr.id())) {
              return (synth(304));
      }
      else {
              return (synth(200));
      }
}

sub vcl_synth {
      set resp.http.ETag = blob.encode(BASE64, blob=rdr.id());
      if (resp.status == 200) {
              rdr.synth();
      }
      return(deliver);
}

To repeat what was said at the outset – Varnish is a caching proxy for HTTP, not an HTTP server. But if your applications of VCL call for the use of static files for responses (almost as if it were an HTTP server), then VMOD file can play an important role.