RFR 8080640: Reduce copying when reading JAR/ZIP entries

Thu May 21 16:48:34 UTC 2015

On 05/20/2015 10:57 AM, Xueming Shen wrote:
> On 05/18/2015 06:44 PM, Staffan Friberg wrote:
>> Hi,
>>
>> Wanted to get reviews and feedback on this performance improvement 
>> for reading from JAR/ZIP files during classloading by reducing 
>> unnecessary copying and reading the entry in one go instead of in 
>> small portions. This shows a significant improvement when reading a 
>> single entry and for a large application with 10k classes and 500+ 
>> JAR files it improved the startup time by 4%.
>>
>> For more details on the background and performance results please see 
>> the RFE entry.
>>
>> RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
>> WEBREV - http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
>>
>> Cheers,
>> Staffan
>
> Hi Staffan,
>
> If I did not miss something here, from your use scenario it appears to 
> me the only thing you really
> need here to help boost your performance is
>
>     byte[] ZipFile.getAllBytes(ZipEntry ze);
>
> You are allocating a byte[] at use side and wrapping it with a 
> ByteBuffer if the size is small enough,
> otherwise, you letting the ZipFile to allocate a big enough one for 
> you. It does not look like you
> can re-use that byte[] (has to be wrapped by the ByteArrayInputStream 
> and return), why do you
> need two different methods here? The logic would be much easier to 
> simply let the ZipFile to allocate
> the needed buffer with appropriate size, fill the bytes and return, 
> with a "OOME" if the entry size
> is bigger than 2g.
>
> The only thing we use from the input ze is its name, get the 
> size/csize from the jzentry, I don't think
> jzentry.csize/size can be "unknown", they are from the "cen" table.
>
> If the real/final use of the bytes is to wrap it with a 
> ByteArrayInputStream,why bother using ByteBuffer
> here? Shouldn't a direct byte[] with exactly the size of the entry 
> server better.
>
> -Sherman
>
Hi Sherman,

Thanks for the comments. I agree, was starting out with bytebuffer 
because I was hoping to be able to cache things where the buffer was 
being used, but since the buffer is past along further I couldn't figure 
out a clean way to do it.
Will rewrite it to simply just return a buffer, and only wrap it in the 
Resource class getByteBuffer.

What would be your thought on updating the ZipFile.getInputStream to 
return ByteArrayInputStream for small entries? Currently I do that work 
outside in two places and moving it would potentially speed up others 
reading small entries as well.

Thanks,
Staffan