Request for Discussion: 8253952: Work around wrong usage of ZipOutputStream.putNextEntry() in user code

Mon Oct 5 10:04:51 UTC 2020

Hi Alan,

thanks a lot for your feedback. Please find my answers inline below:

On Sat, Oct 3, 2020 at 9:11 PM Alan Bateman <Alan.Bateman at oracle.com> wrote:
>
> On 02/10/2020 20:44, Volker Simonis wrote:
> > Hi,
> >
> > I'd like to get your opinion on a workaround for a common but wrong
> > usage pattern of ZipOutputStream.putNextEntry() in user code. I've
> > created a Draft PR with all the gory details (which I'll include here
> > for your convenience). Please let me know what you think?
> >
> java.util.zip is JDK 1.1 era API and has several usability issues.Yes,

I know and I think it's high time to fix them :)

> ZOS.putNextEntry method can be a hazard. Adding an @apiNote with an
> example showing how to use API correctly shouldn't be an issue.

OK. I'll open a new issue so we can handle that independently.

> I don't
> think the proposal to add a system property to change the behavior is
> feasible, at least not without changing the spec but even then it would
> be very icky.

Why do you think so? The answer to this question is especially
important to me because we are considering doing this in our internal
version independently of upstream OpenJKD (and potentially also in 8 &
11).

The spec doesn't currently say anything about the exact zip format
created by ZipOutputStream. I.e. it doesn't mention if ZipOutputStream
creates Local File Headers with compressed size information or if it
puts the compressed size information into the Data Descriptor.
Currently this depends solely on the compressed size value of the
corresponding ZipEntry. But as I've detailed in my initial mail, using
this value is not correct in general if it was determined implicitly
by reading it from a zip file because there's no way to determine the
compression level and exact deflater implementation which produced
that entry. Using that value can only work by chance if the source zip
entry was created by the exact same zip implementation with the exact
same compression level that is used by  the current ZipOutputStream.

Notice that the new property won't change any of the current behaviour
if we call ZipEntry.setCompressedSize() to explicitly set the
compressed size of a ZipEntry object. So if somebody really knows the
compressed size of a ZipEntry (e.g. by manually compressing the data
before), he still can use  setCompressedSize() to trigger the creation
of a LFH with compressed size information. But again, there's nothing
in the spec of ZipOutputStream that mandates the creation of a
specific LFH or that mandates that ZipOutputStream is honoring the
compressed size information from a ZipEntry.

The spec also doesn't say anything about the nature of a ZipEntry we
get from ZipInputStream or ZipFile and especially not if it has the
compressed size attribute set or not. As I've described before, for
ZipInputStream this depends on the format of the zip file and if the
corresponding entry has that information in the LFH. For ZipEntries we
get from ZipFile, the compressed size is always set because ZipFile
uses the Central Directory File Header.

Finally, although that's clearly not proof, I just want to mention
that I've run the corresponding regression and JCK tests without any
issues.

So can you please be more specific why do you think such a property
would require a spec change and if you still think so how and in which
part of the spec?

And just as a side note, we also use the "jdk.util.zip.inhibitZip64"
property in ZipOutputStream for a similar reason. I just wonder how
that could pass the CCC process as I couldn't even find any
documentation for it :)

> Have you explored adding a better API instead of a
> workaround? It wouldn't be hard to have ZipEntry define a writeTo or
> transferTo method for example.

I thought about that but unfortunately I think that's not trivial and
it certainly wouldn't solve the problem with the existing bad user
code which I want to address with the new system property.

I'm also not sure which semantics you envision for
ZipEntry.transferTo()? If it is only meant to copy a ZipEntry object
from one file into another, that would have the same problems except
that we would have the chance to specify its semantics such that it
would always ignore its compressed size attribute.

If you mean that  ZipEntry.transferTo() will be intended to copy a
ZipEntry together with the associated, compressed data that will be
much harder if not impossible to implement with the current API. The
first problem is that ZipEntry has no reference to its associated
ZipFile or ZipInputStream. The much bigger problem is that
ZipInputSream is a "streaming" API. We might read from a zip file
which doesn't have the compressed size information in the LFH. In such
a case, it would be impossible to read the compressed data from the
stream because we don't know its length. We can only start to inflate
the data until inflation stops. Only at that point we know the
compressed size of the entry. But at that point we can't access the
compressed data any more.

So instead of trying to fix this old API, maybe it's time for a
completely new one? I've also started to think a little bit about
creating a new compression/decompression library. It could be
implemented as service with the help of the SPI and could provide
support for additional compression formats as well. What do you think
about such an idea?

Best regards,
Volker

>
> -Alan.