RFR: 8304020: Speed up test/jdk/java/util/zip/ZipFile/TestTooManyEntries.java and clarify its purpose [v3]

Eirik Bjorsnos duke at openjdk.org
Mon Oct 30 12:01:34 UTC 2023


On Sun, 12 Mar 2023 14:12:08 GMT, Lance Andersen <lancea at openjdk.org> wrote:

>> Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Use Integer.toString instead of Long.toString
>
>> Here's a wild idea:
>> 
>> We could inject large extra fields on the CEN entries to inflate the size of each CEN entry. This means we get away with much fewer CEN entries which again means much less memory. Also, writing mostly empty extra data seems to be really fast, at least on my Macbook Pro.
>> 
>> For me, the test now run without the `@requires' and completes in less than 5 seconds.
>> 
>> The cost is slightly more complicated code. But perhaps still worth the added complexity?
>> 
>> ```java
>> @BeforeTest
>> public void setup() throws IOException {
>>     hugeZipFile = new File("cen-too-large.zip");
>>     hugeZipFile.deleteOnExit();
>> 
>>     // Length of generated entry names
>>     int nameLength = 10;
>> 
>>     // We use a large extra field to get away with fewer CEN headers
>>     byte[] extra = makeLargeExtra();
>> 
>>     // The size of a single CEN header, including name and extra field
>>     int cenHeaderSize = ZipFile.CENHDR + nameLength + extra.length;
>> 
>>     // Determine the number of entries needed to exceed the MAX_CEN_SIZE
>>     int numEntries = (MAX_CEN_SIZE / cenHeaderSize) + 1;
>> 
>>     ZipEntry[] entries = new ZipEntry[numEntries];
>>     try (ZipOutputStream zip = new ZipOutputStream(new BufferedOutputStream(new FileOutputStream(hugeZipFile)))) {
>>         // Creating the LocalDataTime once allows faster processing
>>         LocalDateTime time = LocalDateTime.now();
>>         // Add entries until MAX_CEN_SIZE is reached
>>         for (int i = 0; i < numEntries; i++) {
>>             String name = Integer.toString(i);
>>             name = "0".repeat(nameLength -name.length()) + name;
>>             ZipEntry entry = entries[i] = new ZipEntry(name);
>>             // Use STORED for faster processing
>>             entry.setMethod(ZipEntry.STORED);
>>             entry.setSize(0);
>>             entry.setCrc(0);
>>             entry.setTimeLocal(time);
>>             zip.putNextEntry(entry);
>>         }
>>         zip.closeEntry();
>>         for (ZipEntry entry : entries) {
>>             entry.setExtra(extra);
>>         }
>>     }
>> }
>> 
>> /**
>>  * Make an extra field with an 'unknown' tag and the maximum possible data size
>>  * @return
>>  */
>> private static byte[] makeLargeExtra() {
>>     byte[] extra = new byte[Short.MAX_VALUE];
>>     ByteBuffer buffer = ByteBuffer.wrap(extra).order(ByteOrder.LITTLE_ENDIAN);
>>     buffer.putShort((short) 0x9902); // 'unknown' tag
>>     buffer.putShort((short) (extra.length - 2 * Short.BYTES)); // Data size
>>     return e...

This PR was reviewed by @LanceAndersen and @Martin-Buchholz back in May, but unfortunately did not get integrated before it was closed for inactivity.

I'm reopening it now, with a few changes after the last reviews:

- To save disk space required by the test, a SparseOutputStream writes 'holes' until it detects that the last CEN is written, after which it starts writing actual bytes for the END header. This reduces the required disk space from ~2GB to ~4K.
- Since the test no longer requires excessive memory or disk space, the 'manual' tag is removed and the test is removed from the `jdk_core_manual_no_input` group in `TEST.groups`

With a blessing of these last changes on top of what's already reviewed, I'm hoping to integrate this PR soonish. 

And sorry for the long-lived PR, I can only blame it on the unusually great summer in the north of Norway this year :-)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12991#issuecomment-1785034602


More information about the core-libs-dev mailing list