RFR 8080640: Reduce copying when reading JAR/ZIP entries
Staffan Friberg
staffan.friberg at oracle.com
Thu May 21 18:00:41 UTC 2015
On 05/21/2015 09:48 AM, Staffan Friberg wrote:
>
> On 05/20/2015 10:57 AM, Xueming Shen wrote:
>> On 05/18/2015 06:44 PM, Staffan Friberg wrote:
>>> Hi,
>>>
>>> Wanted to get reviews and feedback on this performance improvement
>>> for reading from JAR/ZIP files during classloading by reducing
>>> unnecessary copying and reading the entry in one go instead of in
>>> small portions. This shows a significant improvement when reading a
>>> single entry and for a large application with 10k classes and 500+
>>> JAR files it improved the startup time by 4%.
>>>
>>> For more details on the background and performance results please
>>> see the RFE entry.
>>>
>>> RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
>>> WEBREV - http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
>>>
>>> Cheers,
>>> Staffan
>>
>> Hi Staffan,
>>
>> If I did not miss something here, from your use scenario it appears
>> to me the only thing you really
>> need here to help boost your performance is
>>
>> byte[] ZipFile.getAllBytes(ZipEntry ze);
>>
>> You are allocating a byte[] at use side and wrapping it with a
>> ByteBuffer if the size is small enough,
>> otherwise, you letting the ZipFile to allocate a big enough one for
>> you. It does not look like you
>> can re-use that byte[] (has to be wrapped by the ByteArrayInputStream
>> and return), why do you
>> need two different methods here? The logic would be much easier to
>> simply let the ZipFile to allocate
>> the needed buffer with appropriate size, fill the bytes and return,
>> with a "OOME" if the entry size
>> is bigger than 2g.
>>
>> The only thing we use from the input ze is its name, get the
>> size/csize from the jzentry, I don't think
>> jzentry.csize/size can be "unknown", they are from the "cen" table.
>>
>> If the real/final use of the bytes is to wrap it with a
>> ByteArrayInputStream,why bother using ByteBuffer
>> here? Shouldn't a direct byte[] with exactly the size of the entry
>> server better.
>>
>> -Sherman
>>
> Hi Sherman,
>
> Thanks for the comments. I agree, was starting out with bytebuffer
> because I was hoping to be able to cache things where the buffer was
> being used, but since the buffer is past along further I couldn't
> figure out a clean way to do it.
> Will rewrite it to simply just return a buffer, and only wrap it in
> the Resource class getByteBuffer.
>
> What would be your thought on updating the ZipFile.getInputStream to
> return ByteArrayInputStream for small entries? Currently I do that
> work outside in two places and moving it would potentially speed up
> others reading small entries as well.
>
> Thanks,
> Staffan
Just realized that my use of ByteArrayInputStream would miss Jar
verification if enabled so the way to go hear would be to add it if
possible to the ZipFile.getInputStream.
//Staffan
More information about the core-libs-dev
mailing list