Proposed fix for JDK-8028804 (Deflater.deflateBytes() may produce corrupted output on Deflater level/strategy change)

Tue Nov 26 13:27:32 UTC 2013

Hello,

My name is Thomas Stuefe, I'm with SAP and working on the SAP JVM.

I'd like to propose a fix for
https://bugs.openjdk.java.net/browse/JDK-8028804 . This may also be related
to https://bugs.openjdk.java.net/browse/JDK-8028216.

The problem:

The following call sequence:

Deflater.deflate()
Deflater.setLevel() / Deflater.setStrategy()
Deflater.deflate()

will cause at C level in Deflater.c:

zlib deflate()
zlib deflateParams()
zlib deflate()
.

Zlib's deflateParams() has the following issue: if the stream has unflushed
data, it will attempt to flush it. It will do this by calling - once, a
single time - deflate(..., Z_BLOCK). After that it will change the
compression parameters for the stream, which may switch the compression
function used for the next call to deflate().
If the output buffer was too small to hold the whole output, output will
not have been completely flushed before compression function is changed. As
a result the compressed block may be corrupted.

As a reprocase, see the jtreg test in the webrev below or the repro case
attached to the bug report:

http://cr.openjdk.java.net/~simonis/webrevs/8028804/

This is a bug we encountered when dealing with Apache's
FlushableGZIPOutputStream (
http://tomcat.apache.org/tomcat-6.0-doc/api/org/apache/coyote/http11/filters/FlushableGZIPOutputStream.html),
which in older versions implements flushing by switching the compression
level.

The real problem is that zlib deflateParams(), unlike zlib deflate(), does
not handle output-buffer-too-small correctly. The workaround is to fully
flush the stream before calling deflateParams(), so that the deflate() call
inside deflateParams() becomes a noop.

This fix consists of several parts:

- Introduces a new function in the zlib: deflateParams2() which is a 1:1
copy of deflateParams() but behaves differently if the output buffer was
too small. In that case it leaves the compression parameters unchanged and
returns a newly introduced return code (Z_UNFINISHED_FLUSH). The caller
would have to repeat the call to deflateParams2() with the same parameters
and more output buffer (similar to how deflate() handles this situation).
  Note: I did not just change deflateParams() because I did not want to
hunt down and change all callers.

- In Deflate.c, Deflater.deflateBytes(), call deflateParams2() and handle
Z_UNFINISHED_FLUSH by *not* resetting the flag Deflater.setParams; that
way, on the next call to Deflater.deflateBytes(), deflateParams2() will be
called again until the flush is completed.

- Additionally, for the call to zlib deflateParams2(), I set the input to 0
(z_stream->avail_in). So I defer handing down any new input to the stream
as long as the old input is not flushed. I do this for two reasons:
  1) This would be the expected behaviour for any caller who calls
"Deflater.setLevel", then "Deflater.deflate" and would expect the new input
data to be compressed with the new level
  2) It prevents "endless flushes" because we provide new input data on
every call

The intent of this fix is to transparently fix the Deflater for users of
FlushableGZIPOutputStream and everyone else who changes the compression
parameters in the middle of an unflushed block.

Note that this is one way to fix it; it could also be fixed differently, by
not changing the zlib at all, but this would make the code more difficult
to read and introduce unnecessary flushes (zlib deflateParams only flushes
if needed based on information we do not hanve in Deflater.c).

I am curious what you think; do you think this is a valid fix?

Kind Regards,

Thomas