RFR 8080640: Reduce copying when reading JAR/ZIP entries
Staffan Friberg
staffan.friberg at oracle.com
Thu May 21 16:48:34 UTC 2015
On 05/20/2015 10:57 AM, Xueming Shen wrote:
> On 05/18/2015 06:44 PM, Staffan Friberg wrote:
>> Hi,
>>
>> Wanted to get reviews and feedback on this performance improvement
>> for reading from JAR/ZIP files during classloading by reducing
>> unnecessary copying and reading the entry in one go instead of in
>> small portions. This shows a significant improvement when reading a
>> single entry and for a large application with 10k classes and 500+
>> JAR files it improved the startup time by 4%.
>>
>> For more details on the background and performance results please see
>> the RFE entry.
>>
>> RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
>> WEBREV - http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
>>
>> Cheers,
>> Staffan
>
> Hi Staffan,
>
> If I did not miss something here, from your use scenario it appears to
> me the only thing you really
> need here to help boost your performance is
>
> byte[] ZipFile.getAllBytes(ZipEntry ze);
>
> You are allocating a byte[] at use side and wrapping it with a
> ByteBuffer if the size is small enough,
> otherwise, you letting the ZipFile to allocate a big enough one for
> you. It does not look like you
> can re-use that byte[] (has to be wrapped by the ByteArrayInputStream
> and return), why do you
> need two different methods here? The logic would be much easier to
> simply let the ZipFile to allocate
> the needed buffer with appropriate size, fill the bytes and return,
> with a "OOME" if the entry size
> is bigger than 2g.
>
> The only thing we use from the input ze is its name, get the
> size/csize from the jzentry, I don't think
> jzentry.csize/size can be "unknown", they are from the "cen" table.
>
> If the real/final use of the bytes is to wrap it with a
> ByteArrayInputStream,why bother using ByteBuffer
> here? Shouldn't a direct byte[] with exactly the size of the entry
> server better.
>
> -Sherman
>
Hi Sherman,
Thanks for the comments. I agree, was starting out with bytebuffer
because I was hoping to be able to cache things where the buffer was
being used, but since the buffer is past along further I couldn't figure
out a clean way to do it.
Will rewrite it to simply just return a buffer, and only wrap it in the
Resource class getByteBuffer.
What would be your thought on updating the ZipFile.getInputStream to
return ByteArrayInputStream for small entries? Currently I do that work
outside in two places and moving it would potentially speed up others
reading small entries as well.
Thanks,
Staffan
More information about the core-libs-dev
mailing list