RFR 8080640: Reduce copying when reading JAR/ZIP entries

Thu May 21 18:00:41 UTC 2015

On 05/21/2015 09:48 AM, Staffan Friberg wrote:
>
> On 05/20/2015 10:57 AM, Xueming Shen wrote:
>> On 05/18/2015 06:44 PM, Staffan Friberg wrote:
>>> Hi,
>>>
>>> Wanted to get reviews and feedback on this performance improvement 
>>> for reading from JAR/ZIP files during classloading by reducing 
>>> unnecessary copying and reading the entry in one go instead of in 
>>> small portions. This shows a significant improvement when reading a 
>>> single entry and for a large application with 10k classes and 500+ 
>>> JAR files it improved the startup time by 4%.
>>>
>>> For more details on the background and performance results please 
>>> see the RFE entry.
>>>
>>> RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
>>> WEBREV - http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
>>>
>>> Cheers,
>>> Staffan
>>
>> Hi Staffan,
>>
>> If I did not miss something here, from your use scenario it appears 
>> to me the only thing you really
>> need here to help boost your performance is
>>
>>     byte[] ZipFile.getAllBytes(ZipEntry ze);
>>
>> You are allocating a byte[] at use side and wrapping it with a 
>> ByteBuffer if the size is small enough,
>> otherwise, you letting the ZipFile to allocate a big enough one for 
>> you. It does not look like you
>> can re-use that byte[] (has to be wrapped by the ByteArrayInputStream 
>> and return), why do you
>> need two different methods here? The logic would be much easier to 
>> simply let the ZipFile to allocate
>> the needed buffer with appropriate size, fill the bytes and return, 
>> with a "OOME" if the entry size
>> is bigger than 2g.
>>
>> The only thing we use from the input ze is its name, get the 
>> size/csize from the jzentry, I don't think
>> jzentry.csize/size can be "unknown", they are from the "cen" table.
>>
>> If the real/final use of the bytes is to wrap it with a 
>> ByteArrayInputStream,why bother using ByteBuffer
>> here? Shouldn't a direct byte[] with exactly the size of the entry 
>> server better.
>>
>> -Sherman
>>
> Hi Sherman,
>
> Thanks for the comments. I agree, was starting out with bytebuffer 
> because I was hoping to be able to cache things where the buffer was 
> being used, but since the buffer is past along further I couldn't 
> figure out a clean way to do it.
> Will rewrite it to simply just return a buffer, and only wrap it in 
> the Resource class getByteBuffer.
>
> What would be your thought on updating the ZipFile.getInputStream to 
> return ByteArrayInputStream for small entries? Currently I do that 
> work outside in two places and moving it would potentially speed up 
> others reading small entries as well.
>
> Thanks,
> Staffan
Just realized that my use of ByteArrayInputStream would miss Jar 
verification if enabled so the way to go hear would be to add it if 
possible to the ZipFile.getInputStream.

//Staffan