RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)
Volker Simonis
volker.simonis at gmail.com
Mon Mar 21 11:45:38 UTC 2022
Ping...
On Thu, Mar 10, 2022 at 8:26 PM Lance Andersen <lance.andersen at oracle.com>
wrote:
> Hi Volker,
>
> Thank you for the reminder
>
> This is on my radar as well but have not had a chance spend any time on
> this as yet.
>
>
>
> On Mar 9, 2022, at 2:33 PM, Volker Simonis <volker.simonis at gmail.com>
> wrote:
>
> @Alan, @Lance,
>
> Sorry for my obtrusiveness, but what's your opinion on this issue?
>
> Thank you and best regards,
> Volker
>
> On Fri, Mar 4, 2022 at 11:04 AM Volker Simonis <volker.simonis at gmail.com>
> wrote:
>
>
> `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater
> functionality. `Inflater::inflate(byte[] output, int off, int len)`
> currently calls zlib's native `inflate(..)` function and passes the
> address of `output[off]` and `len` to it via JNI.
>
> The specification of zlib's `inflate(..)` function (i.e. the [API
> documentation in the original zlib
> implementation](
> https://urldefense.com/v3/__https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h*L400__;Iw!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0w3u0Q7HA$
> ))
> doesn't give any guarantees with regard to usage of the output buffer.
> It only states that upon completion the function will return the
> number of bytes that have been written (i.e. "inflated") into the
> output buffer.
>
> The original zlib implementation only wrote as many bytes into the
> output buffer as it inflated. However, this is not a hard requirement
> and newer, more performant implementations of the zlib library like
> [zlib-chromium](
> https://urldefense.com/v3/__https://chromium.googlesource.com/chromium/src/third_party/zlib/__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0yNMBtuew$
> )
> or [zlib-cloudflare](
> https://urldefense.com/v3/__https://github.com/cloudflare/zlib__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0y7Uhi8nA$
> ) can use more
> bytes of the output buffer than they actually inflate as a scratch
> buffer. See
> https://urldefense.com/v3/__https://github.com/simonis/zlib-chromium__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0z1qVlYPg$
> for a more
> detailed description of their approach and its performance benefit.
>
> These new zlib versions can still be used transparently from Java
> (e.g. by putting them into the `LD_LIBRARY_PATH` or by using
> `LD_PRELOAD`), because they still fully comply to specification of
> `Inflater::inflate(..)`. However, we might run into problems when
> using the `Inflater` functionality from the `InflaterInputStream`
> class. `InflaterInputStream` is derived from from `InputStream` and as
> such, its `read(byte[] b, int off, int len)` method is quite
> constrained. It specifically specifies that if *k* bytes have been
> read, then "these bytes will be stored in elements `b[off]` through
> `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through
> `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[]
> b, int off, int len)` (which is constrained by
> `InputStream::read(..)`'s specification) calls
> `Inflater::inflate(byte[] b, int off, int len)` and directly passes
> its output buffer down to the native zlib `inflate(..)` method which
> is free to change the bytes beyond `b[off+`*k*`]` (where *k* is the
> number of inflated bytes).
>
> From a practical point of view, I don't see this as a big problem,
> because callers of `InflaterInputStream::read(byte[] b, int off, int
> len)` can never know how many bytes will be written into the output
> buffer `b` (and in fact its content can always be completely
> overwritten). It therefore makes no sense to depend on any data there
> being untouched after the call. Also, having used zlib-cloudflare
> productively for about two years, we haven't seen real-world issues
> because of this behavior yet. However, from a specification point of
> view it is easy to artificially construct a program which violates
> `InflaterInputStream::read(..)`'s postcondition if using one of the
> alterantive zlib implementations. A recently integrated JTreg test
> (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally"
> fails with zlib-chromium but could be easily fixed to run with
> alternative implementations as well.
>
> What should/can be done to solve this problem? I think we have three
> options:
>
> 1. Relax the specification of `InflaterInputStream::read(..)` and
> specifically note in the API documentation that a call to
> `InflaterInputStream::read(byte[] b, int off, int len)` may write more
> than *k* characters into `b` where *k* is the returned number of
> inflated bytes. Notice that there's a precedence for this approach in
> `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden
> method of InputStream, returns -1 instead of zero if the end of the
> stream has been reached and len==0*".
>
> 2. Tighten the specification of `Inflater::read(byte[] b, int off, int
> len)` to explicitly forbid any writes into the output array `b` beyond
> the inflated bytes.
>
> 3. Change the implementation of `InflaterInputStream::read(..)` to
> call `Inflater::read(..)` with a temporary buffer and only copy the
> nu,ber of inflated bytes into `InflaterInputStream::read(..)`'s output
> buffer. I think we all agree that this is only a theoretic option
> because of its unacceptable performance regression.
>
> I personally favor option 1 but I'm interested in your opinions?
>
> Thank you and best regards,
> Volker
>
>
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com
>
>
>
>
More information about the core-libs-dev
mailing list