Nested ESI + gzip + Squid 2.7.STABLE9 = invalid compressed data--format violated

Andrea Campi andrea.campi at zephirworks.com
Wed Mar 23 16:08:06 CET 2011


Hi,

I am currently working with a client to implement ESI + gzip with
trunk Varnish; since phk asked for help in breaking it, here we are :)

Some background: the customer is a publishing company and we are
working on the website for their daily newspaper, so ease of
integration with their CMS and timely expiration of ESI fragments is
paramount.
Because of this, I'm using the classic technique of having the page
esi:include a document with very short TTL, that in turn esi:includes
the real fragment (that has a long TTL), including in the URL the
last-modification TTL.


So we have something like:

index.shtml -> /includes2010/header.esi/homepage ->
/includes2010/header.shtml/homepage

This works nicely when I strip the Accept-Encoding header, on both
2.1.5 and trunk.
But it breaks down with gzip compression on: Safari and Chrome give up
at the point where the first ESI include is, Firefox mostly just
errors out; all of them sometimes provide vague errors.
The best info I have is from: "curl | zip"

gzip: out: invalid compressed data--format violated


Unsetting bereq.http.accept-encoding on the first ESI request didn't
help; unsetting it on the second request *did* help, fixing the issue
for all browsers.
Setting TTL=0 for /includes2010/header.shtml/homepage didn't make a
difference, nor did changing vcl_recv to return(pass), so it seems
it's not a matter of what is stored in the cache.


[.... a couple of hours later ....]

Long story short, I finally realized the problem is not with Varnish
per se, but with the office proxy (Squid 2.7.STABLE9); it seems to
corrupt the gzip stream just after the 00 00 FF FF sequence:

-0004340    5d  90  4a  4e  4e  00  00  00  00  ff  ff  ec  3d  db  72  dc
+0004340    5d  90  4a  4e  4e  00  00  00  00  ff  ff  00  3d  db  72  dc

-0024040    75  21  aa  39  01  00  00  00  ff  ff  d4  59  db  52  23  39
+0024040    75  21  aa  39  01  00  00  00  ff  ff  00  59  db  52  23  39

and so on.


However, what I wrote above is still true: if I only have one level of
ESI include, or if I have two but the inner one is not originally
gzip, Squid doesn't corrupt the content.


I have a few gzipped files, as well as sample vcl and html files (not
that these matter after all), I can send them if those would help.

Andrea




More information about the varnish-misc mailing list