RFR: 8304020: Speed up test/jdk/java/util/zip/ZipFile/TestTooManyEntries.java and clarify its purpose [v3]
Lance Andersen
lancea at openjdk.org
Sun Mar 12 14:15:20 UTC 2023
On Sun, 12 Mar 2023 10:00:50 GMT, Eirik Bjorsnos <duke at openjdk.org> wrote:
>> Please review this PR which speeds up TestTooManyEntries and clarifies its purpose:
>>
>> - The name 'TestTooManyEntries' does not clearly convey the purpose of the test. What is tested is the validation that the total CEN size fits in a Java byte array. Suggested rename: CenSizeTooLarge
>> - The test creates DEFLATED entries which incurs zlib costs and File Data / Data Descriptors for no additional benefit. We can use STORED instead.
>> - By creating a single LocalDateTime and setting it with `ZipEntry.setTimeLocal`, we can avoid repeated time zone calculations.
>> - The name of entries is generated by calling UUID.randomUUID, we could use simple counter instead.
>> - The produced file is unnecessarily large. We know how large a CEN entry is, let's take advantage of that to create a file with the minimal size.
>> - The summary and comments of the test can be improved to help explain the purpose of the test and how we reach the limit being tested.
>>
>> These speedups reduced the runtime from 4 min 17 sec to 1 min 54 sec on my Macbook Pro. The produced ZIP size was reduced from 5.7 GB to 3.5 GB.
>
> Eirik Bjorsnos has updated the pull request incrementally with one additional commit since the last revision:
>
> Use Integer.toString instead of Long.toString
> Here's a wild idea:
>
> We could inject large extra fields on the CEN entries to inflate the size of each CEN entry. This means we get away with much fewer CEN entries which again means much less memory. Also, writing mostly empty extra data seems to be really fast, at least on my Macbook Pro.
>
> For me, the test now run without the `@requires' and completes in less than 5 seconds.
>
> The cost is slightly more complicated code. But perhaps still worth the added complexity?
>
> ```java
> @BeforeTest
> public void setup() throws IOException {
> hugeZipFile = new File("cen-too-large.zip");
> hugeZipFile.deleteOnExit();
>
> // Length of generated entry names
> int nameLength = 10;
>
> // We use a large extra field to get away with fewer CEN headers
> byte[] extra = makeLargeExtra();
>
> // The size of a single CEN header, including name and extra field
> int cenHeaderSize = ZipFile.CENHDR + nameLength + extra.length;
>
> // Determine the number of entries needed to exceed the MAX_CEN_SIZE
> int numEntries = (MAX_CEN_SIZE / cenHeaderSize) + 1;
>
> ZipEntry[] entries = new ZipEntry[numEntries];
> try (ZipOutputStream zip = new ZipOutputStream(new BufferedOutputStream(new FileOutputStream(hugeZipFile)))) {
> // Creating the LocalDataTime once allows faster processing
> LocalDateTime time = LocalDateTime.now();
> // Add entries until MAX_CEN_SIZE is reached
> for (int i = 0; i < numEntries; i++) {
> String name = Integer.toString(i);
> name = "0".repeat(nameLength -name.length()) + name;
> ZipEntry entry = entries[i] = new ZipEntry(name);
> // Use STORED for faster processing
> entry.setMethod(ZipEntry.STORED);
> entry.setSize(0);
> entry.setCrc(0);
> entry.setTimeLocal(time);
> zip.putNextEntry(entry);
> }
> zip.closeEntry();
> for (ZipEntry entry : entries) {
> entry.setExtra(extra);
> }
> }
> }
>
> /**
> * Make an extra field with an 'unknown' tag and the maximum possible data size
> * @return
> */
> private static byte[] makeLargeExtra() {
> byte[] extra = new byte[Short.MAX_VALUE];
> ByteBuffer buffer = ByteBuffer.wrap(extra).order(ByteOrder.LITTLE_ENDIAN);
> buffer.putShort((short) 0x9902); // 'unknown' tag
> buffer.putShort((short) (extra.length - 2 * Short.BYTES)); // Data size
> return extra;
> }
> ```
I think this is probably OK. I would probably create a constant with the Data Size to make it clearer and add more of a detailed comment of this method (laying out the structure of the Extra field) along with pointing to 4.6.1 of the APP.note
-------------
PR: https://git.openjdk.org/jdk/pull/12991
More information about the core-libs-dev
mailing list