RFR: 8282648: Weaken the InflaterInputStream specification in order to allow faster Zip implementations [v10]
Volker Simonis
simonis at openjdk.org
Wed Aug 31 09:06:10 UTC 2022
On Thu, 21 Jul 2022 17:45:39 GMT, Mark Reinhold <mr at openjdk.org> wrote:
>> Volker Simonis has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available.
>
> First, I’ll observe that there is no specification conflict here. The specifications of `InflaterInputStream::read(byte[] b, int off, int len)` and `Inflater::inflate(byte[] b, int off, int len)` are not in conflict at all. The latter is simply weaker than the former.
>
> What is true is that if you build the JDK in the default manner, without any patches, then what you wind up with is an `InflaterInputStream::read(...)` implementation that satisfies the inherited `InputStream::read(...)` specification, including the constraint that a read operation must not _scribble_ in the caller’s output buffer, _i.e._, modify any byte in that buffer beyond the returned byte count. (That constraint is, so far as we know, satisfied by every `InputStream` subclass defined in the JDK.)
>
> A default JDK build’s `InflaterInputStream::read(...)` method doesn’t scribble because
>
> - The implementation of `InflaterInputStream::read(...)` assumes that the implementation of `Inflater::inflate(...)` doesn’t scribble, and
>
> - The implementation of `Inflater::inflate(...)` satisfies that assumption because it invokes the corresponding function in the [standard native Zip implementation](https://zlib.net/), which itself doesn’t scribble.
>
> So, not only does the specification forbid `InflaterInputStream::read(...)` from scribbling, but in the official Java SE Reference Implementation — which is simply a default JDK build — that method does not scribble. (Perhaps by accident, but that is immaterial here.)
>
> Therefore, if in your own JDK build you substitute an alternative Zip implementation whose `inflate` function does scribble then you wind up with a JDK that both violates the specification and behaves differently than the RI. It is incompatible. If you use such a JDK in production then you risk breaking existing code.
>
> Now, compatibility is one of the key values of the Java Platform, but performance is not unimportant. Is there something we can do to accommodate [alternative, faster, scribbling Zip implementations][faster-zlib]?
>
> @simonis lists three options in the [issue]. I’ll cast the options somewhat differently.
>
> 1. Do not change the specification in any way. The specification text that forbids scribbling dates back to Java 1.1, released over 25 years ago. (Arguably that text over-specifies the `InputStream::read(...)` method, but that is also immaterial.) It’s all too possible that existing code — no matter how odd or poorly written we think it might be — depends upon this constraint.
>
> With this option, a JDK build that uses a scribbling Zip implementation would have to buffer that implementation’s output and copy only the returned number of bytes into the caller’s buffer. This would degrade any performance benefit.
>
> 2. Weaken the specification of `InflaterInputStream` — and, thereby, that of its subclasses `GZipInputStream` and `ZipInputStream` — to allow scribbling. This is essentially @simonis’s initial suggestion, as embodied in [63ae357] earlier in this PR.
>
> This option risks breaking existing code that depends upon the 25-year-old promise that `InflaterInputStream::read(...)` will not scribble. Changing the specification of an indirect subclass of `InputStream`, moreover, is apt to go unnoticed by many developers, who routinely deal with instances of `InputStream` without any knowledge of those instances’ concrete classes. (@jaikiran gives one example above.)
>
> 3. Weaken the specification of the `InputStream::read(...)` method to allow scribbling, as [suggested by @jddarcy in the CSR][jddarcy].
>
> This option is at least as risky to existing code as the second option, though it’s marginally more likely that developers writing new code will notice the specification change.
>
> What’s worse, however, is that this option gives developers writing new subclasses of `InputStream` permission to deviate from both that class’s longstanding specification and the actual behavior of every `InputStream` subclass defined in the JDK. In the long term, that could wreak subtle bugs throughout the ecosystem, even breaking code that does absolutely nothing with Zip-related input streams.
>
> In favor of both the second and third options is @simonis’s statement above that his employer (Amazon) has used JDK builds with a scribbling Zip implementation in production for two years. He has further elaborated (privately) that this includes hundreds of services built with many of the usual popular frameworks and libraries (Spring, Hibernate, ASM, etc.). This suggests that the practical impact, out in the wild, of allowing scribbling Zip implementations is acceptable.
>
> On that basis I am — just barely — comfortable with the second option, _i.e._, weakening the specification of `InflaterInputStream` to allow scribbling, thereby contradicting the specification of its `InputStream` superclass.
>
> Explicitly specifying a subclass in a way that’s inconsistent with the specification of one of its superclasses is generally a really bad idea. It violates the [Liskov Substitution Principle][lsp], which enables instances of a subclass to be substituted freely for instances of one of its superclasses. Thus we do not condone such changes lightly. There are, however, precedents in a handful of special cases in the Java Platform; _e.g._, in the [`java.util.IdentityHashMap`][ihm] and [`java.sql.Timestamp`][ts] classes.
>
> The third option above would not violate the Liskov substitution principle, but in my view could prove more harmful in the long run.
>
> Recommendations:
>
> - Revert this PR back to [63ae357], and make corresponding updates to the [issue] and the [CSR].
>
> - Additionally, find every method in every `java.*` class that’s declared to return an `InputStream` (_e.g._, `ZipFile::getInputStream(...)` but actually returns an instance of `InflaterInputStream` or one of its subclasses. Add an API note to each such method to let developers know that the returned input stream might scribble, per @jaikiran’s suggestion above. These notes can all use the same text, linking to the revised `InflaterInputStream::read(...)` specification.
>
> - Revise the summary of this PR, the issue, and the CSR to reflect the true nature of the proposed change rather than assert a nonexistent conflict. Consider “Weaken the `InflaterInputStream` specification in order to allow faster Zip implementations”.
>
> - Change the issue type from a bug — which it’s not — to an enhancement.
>
> - Create a related release-note issue, as @RogerRiggs suggests above.
>
>
> [faster-zlib]: https://github.com/simonis/zlib-chromium#inflater-improvements-in-zlib-chromium
> [issue]: https://bugs.openjdk.org/browse/JDK-8282648
> [63ae357]: https://github.com/openjdk/jdk/pull/7986/commits/63ae3572da674a0926bb9d0adcea88d89bb56707
> [jddarcy]: https://bugs.openjdk.org/browse/JDK-8283758?focusedCommentId=14492930&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492930
> [CSR]: https://bugs.openjdk.org/browse/JDK-8283758
> [lsp]: https://en.wikipedia.org/wiki/Liskov_substitution_principle
> [ihm]: https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/IdentityHashMap.html
> [ts]: https://docs.oracle.com/en/java/javase/18/docs/api/java.sql/java/sql/Timestamp.html
@mbreinhold, @LanceAndersen thanks for your reviews. I'm waiting for the CSR approval before pushing.
-------------
PR: https://git.openjdk.org/jdk/pull/7986
More information about the core-libs-dev
mailing list