RFR 8080640: Reduce copying when reading JAR/ZIP entries

Fri May 22 18:41:38 UTC 2015

On 05/21/2015 11:00 AM, Staffan Friberg wrote:
>
> On 05/21/2015 09:48 AM, Staffan Friberg wrote:
>>
>> On 05/20/2015 10:57 AM, Xueming Shen wrote:
>>> On 05/18/2015 06:44 PM, Staffan Friberg wrote:
>>>> Hi,
>>>>
>>>> Wanted to get reviews and feedback on this performance improvement 
>>>> for reading from JAR/ZIP files during classloading by reducing 
>>>> unnecessary copying and reading the entry in one go instead of in 
>>>> small portions. This shows a significant improvement when reading a 
>>>> single entry and for a large application with 10k classes and 500+ 
>>>> JAR files it improved the startup time by 4%.
>>>>
>>>> For more details on the background and performance results please 
>>>> see the RFE entry.
>>>>
>>>> RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
>>>> WEBREV - http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
>>>>
>>>> Cheers,
>>>> Staffan
>>>
>>> Hi Staffan,
>>>
>>> If I did not miss something here, from your use scenario it appears 
>>> to me the only thing you really
>>> need here to help boost your performance is
>>>
>>>     byte[] ZipFile.getAllBytes(ZipEntry ze);
>>>
>>> You are allocating a byte[] at use side and wrapping it with a 
>>> ByteBuffer if the size is small enough,
>>> otherwise, you letting the ZipFile to allocate a big enough one for 
>>> you. It does not look like you
>>> can re-use that byte[] (has to be wrapped by the 
>>> ByteArrayInputStream and return), why do you
>>> need two different methods here? The logic would be much easier to 
>>> simply let the ZipFile to allocate
>>> the needed buffer with appropriate size, fill the bytes and return, 
>>> with a "OOME" if the entry size
>>> is bigger than 2g.
>>>
>>> The only thing we use from the input ze is its name, get the 
>>> size/csize from the jzentry, I don't think
>>> jzentry.csize/size can be "unknown", they are from the "cen" table.
>>>
>>> If the real/final use of the bytes is to wrap it with a 
>>> ByteArrayInputStream,why bother using ByteBuffer
>>> here? Shouldn't a direct byte[] with exactly the size of the entry 
>>> server better.
>>>
>>> -Sherman
>>>
>> Hi Sherman,
>>
>> Thanks for the comments. I agree, was starting out with bytebuffer 
>> because I was hoping to be able to cache things where the buffer was 
>> being used, but since the buffer is past along further I couldn't 
>> figure out a clean way to do it.
>> Will rewrite it to simply just return a buffer, and only wrap it in 
>> the Resource class getByteBuffer.
>>
>> What would be your thought on updating the ZipFile.getInputStream to 
>> return ByteArrayInputStream for small entries? Currently I do that 
>> work outside in two places and moving it would potentially speed up 
>> others reading small entries as well.
>>
>> Thanks,
>> Staffan
> Just realized that my use of ByteArrayInputStream would miss Jar 
> verification if enabled so the way to go hear would be to add it if 
> possible to the ZipFile.getInputStream.
>
> //Staffan
Hi,

Here is an updated webrev which uses a byte[] directly and also uses 
ByteArrayInputStream in ZipFile for small entries below 128k.

http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.1

//Staffan