RFR: 8190753: (zipfs): Accessing a large entry (> 2^31 bytes) leads to a negative initial size for ByteArrayOutputStream [v3]

Jaikiran Pai jpai at openjdk.java.net
Sat Jul 3 16:19:25 UTC 2021


On Wed, 30 Jun 2021 16:05:40 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>> Can I please get a review for this proposed fix for the issue reported in https://bugs.openjdk.java.net/browse/JDK-8190753?
>> 
>> The commit here checks for the size of the zip entry before trying to create a `ByteArrayOutputStream` for that entry's content. A new jtreg test has been included in this commit to reproduce the issue and verify the fix.
>> 
>> P.S: It's still a bit arguable whether it's a good idea to create the `ByteArrayOutputStream` with an initial size potentially as large as the `MAX_ARRAY_SIZE` or whether it's better to just use some smaller default value. However, I think that can be addressed separately while dealing with https://bugs.openjdk.java.net/browse/JDK-8011146
>
> Jaikiran Pai has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - an initial proposed test for testing compressed size entry greater than 2gb
>  - minor summary update on test

Based on the inputs received, I've now updated this PR to enhance the ZipFS to allow for rolling over the outputstream data, into a temporary file after reaching a threshold. Details follow:

- A new "tempFileThreshold" property has been introduced for ZipFileSystem. This property can be passed like other existing properties when creating the filesystem. The value of this property is expected to be a size in bytes and represents the threshold which will be used to decide whether and when to use a temporary file for outputstreams of zip file entries returned by the ZipFileSystem.

- The "tempFileThreshold" property is optional and if not set to any explicit value will default to 10MB. In other words, this feature is by default enabled, without the user having to do any specific configuration (not even the existing "useTempFile" property is requried to be set).

- To disable the threshold based temp file creation feature, a value of 0 or a negative value can be passed.

- A new (internal) `FileRolloverOutputStream` has been introduced in `ZipFileSystem` and forms the core of this enhancement. It extends the `ByteArrayOutputStream` and its write methods have the additional ability to rollover the current and subsequently written data into a temporary file representing that zip file entry.

- A new `ZipFSOutputStreamTest` has been added with the sole focus of verifying the usage of this new "tempFileThreshold" property.

- The previously added `LargeEntrySizeTest` and the manual `LargeCompressedEntrySizeTest` continue to stay and are mainly there to test the large file sizes and deflated sizes of the zip entries and verify that the original reported JBS issue is fixed by this enhancement. One important thing to note about these tests is that I've now removed the explicit "@requires" (memory) requirements, since after this enhancement (with "tempFileThreshold" by default enabled as in those tests), it no longer should require any specific high memory systems to run these tests.

There are still some decisions to be made:

1. Should we introduce this new property or should we enhance the existing "useTempFile" property to allow a size to be passed? Specifically, can we/should we use that existing property to allow users to set the following values:

	- "true", this would imply the temp file feature is enabled always irrespective of the zip entry size. This is how the value of "true" is currently handled before the patch in this PR. So no change in behaviour.

	- a byte size, represented as a `String` or an integer. This would imply that the user wants to enable the temp file feature, but only when the size or compressed size of the zip entry reaches a threshold specified by this value. This would mean that for sizes lesser than this the ZipFS implementation would use a ByteArrayOutputStream and would only rollover to a temp file when the threshold is reached.

	- "false", this would disable the temp file feature completely and outputstreams for zip entries of the ZipFS instance will always use ByteArrayOutputStream

	- value of "0" or "-1". This would be the same as specifying "false" value.

Using the existing property and just one property to control the temp file creation semantics will help avoid having to deal with 2 different properties ("useTempFile" and the "tempFileThreshold"). Plus it would also prevent any potentially conflicting user specified values for them. For example, we won't have to decide what to do when a user sets useTempFile=false and tempFileThreshold=1234.

If we do decide to introduce this new property then some small amount of additional work needs to be done in this implementation to make sure semantically conflicting values for these 2 properties are handled correctly. I decided to skip that part in this round of the PR till we reached a decision about the properties.

2. Given that this is a new enhancement, I believe this requires a CSR.

3. Should this PR be now linked to https://bugs.openjdk.java.net/browse/JDK-8011146 instead of https://bugs.openjdk.java.net/browse/JDK-8190753?

4. I've never previously created a manual test case. The `LargeCompressedEntrySizeTest` in this PR is expected to be a manual test case (given how long it might take to run on various different systems). The only difference between this test case and other jtreg automated tests is the absence of a `@test` on this one. Is this how manual tests are written or is there some other way?

-------------

PR: https://git.openjdk.java.net/jdk/pull/4607


More information about the core-libs-dev mailing list