RFR 8080640: Reduce copying when reading JAR/ZIP entries

Tue Jun 23 17:54:45 UTC 2015

Hi Sherman,

Thanks for the review. I removed the unused import and the changes to 
reduce the lock region.

New webrev, http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.3

Thanks,
Staffan

On 06/08/2015 02:37 PM, Xueming Shen wrote:
> Staffan,
>
> (1) ByteArrayInputSteram is no longer needed in ZipFile
> (2) You've changed the lock region in ZipFile.getInputSteram. Given we 
> are not
>      doing ByteArrayInpusteram for this method, can we just not touch 
> this method
>      and the class ZipFileInputSteram()?
>      The concern is that we did some changes in that area back to 2010 
> and triggered
>      a complicated race condition regression [1], it was finally fixed 
> after lot of rounds
>      of discussion. I still have all those emails in my inbox. It 
> would be better to keep
>      whatever works for now, instead of re-fresh all the memory (read 
> all those emails)
>      to figure out if the latest change might have a negative impact.
>
> The getBytes() implementation looks good to me.
>
> Thanks!
> -Sherman
>
> [1] 
> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-March/006355.html
>
> On 06/05/2015 11:09 AM, Staffan Friberg wrote:
>> Hi Sherman,
>>
>> I have a new webrev which reverts that part, 
>> http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.2/
>>
>> Summary of changes
>>     Reduce lock region in ZipFile.getInputstream
>>     Add private ZipFile.getBytes that can be used in select places in 
>> the JDK where all bytes will be read
>>
>> Could you sponsor this change once it has been reviewed?
>>
>> Thanks,
>> Staffan
>>
>> On 06/03/2015 10:45 AM, Xueming Shen wrote:
>>> Staffan,
>>>
>>> I'm not convinced that the benefit here is significant enough to 
>>> change the
>>> getInputStream() to return a ByteArrayInputStream, given this can be 
>>> easily
>>> achieved by wrapping the returned byte[] from getBytes(ZipEntry) at 
>>> user's
>>> site. I would suggest to file a separate rfe on this disagreement 
>>> and move on
>>> with the agreed getBytes() for now.
>>>
>>> Thanks,
>>> -Sherman
>>>
>>> On 06/02/2015 10:27 AM, Staffan Friberg wrote:
>>>>
>>>> On 05/22/2015 01:15 PM, Staffan Friberg wrote:
>>>>> On 05/22/2015 11:51 AM, Xueming Shen wrote:
>>>>>> On 05/22/2015 11:41 AM, Staffan Friberg wrote:
>>>>>>>
>>>>>>> On 05/21/2015 11:00 AM, Staffan Friberg wrote:
>>>>>>>>
>>>>>>>> On 05/21/2015 09:48 AM, Staffan Friberg wrote:
>>>>>>>>>
>>>>>>>>> On 05/20/2015 10:57 AM, Xueming Shen wrote:
>>>>>>>>>> On 05/18/2015 06:44 PM, Staffan Friberg wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Wanted to get reviews and feedback on this performance 
>>>>>>>>>>> improvement for reading from JAR/ZIP files during 
>>>>>>>>>>> classloading by reducing unnecessary copying and reading the 
>>>>>>>>>>> entry in one go instead of in small portions. This shows a 
>>>>>>>>>>> significant improvement when reading a single entry and for 
>>>>>>>>>>> a large application with 10k classes and 500+ JAR files it 
>>>>>>>>>>> improved the startup time by 4%.
>>>>>>>>>>>
>>>>>>>>>>> For more details on the background and performance results 
>>>>>>>>>>> please see the RFE entry.
>>>>>>>>>>>
>>>>>>>>>>> RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
>>>>>>>>>>> WEBREV - 
>>>>>>>>>>> http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Staffan
>>>>>>>>>>
>>>>>>>>>> Hi Staffan,
>>>>>>>>>>
>>>>>>>>>> If I did not miss something here, from your use scenario it 
>>>>>>>>>> appears to me the only thing you really
>>>>>>>>>> need here to help boost your performance is
>>>>>>>>>>
>>>>>>>>>>     byte[] ZipFile.getAllBytes(ZipEntry ze);
>>>>>>>>>>
>>>>>>>>>> You are allocating a byte[] at use side and wrapping it with 
>>>>>>>>>> a ByteBuffer if the size is small enough,
>>>>>>>>>> otherwise, you letting the ZipFile to allocate a big enough 
>>>>>>>>>> one for you. It does not look like you
>>>>>>>>>> can re-use that byte[] (has to be wrapped by the 
>>>>>>>>>> ByteArrayInputStream and return), why do you
>>>>>>>>>> need two different methods here? The logic would be much 
>>>>>>>>>> easier to simply let the ZipFile to allocate
>>>>>>>>>> the needed buffer with appropriate size, fill the bytes and 
>>>>>>>>>> return, with a "OOME" if the entry size
>>>>>>>>>> is bigger than 2g.
>>>>>>>>>>
>>>>>>>>>> The only thing we use from the input ze is its name, get the 
>>>>>>>>>> size/csize from the jzentry, I don't think
>>>>>>>>>> jzentry.csize/size can be "unknown", they are from the "cen" 
>>>>>>>>>> table.
>>>>>>>>>>
>>>>>>>>>> If the real/final use of the bytes is to wrap it with a 
>>>>>>>>>> ByteArrayInputStream,why bother using ByteBuffer
>>>>>>>>>> here? Shouldn't a direct byte[] with exactly the size of the 
>>>>>>>>>> entry server better.
>>>>>>>>>>
>>>>>>>>>> -Sherman
>>>>>>>>>>
>>>>>>>>> Hi Sherman,
>>>>>>>>>
>>>>>>>>> Thanks for the comments. I agree, was starting out with 
>>>>>>>>> bytebuffer because I was hoping to be able to cache things 
>>>>>>>>> where the buffer was being used, but since the buffer is past 
>>>>>>>>> along further I couldn't figure out a clean way to do it.
>>>>>>>>> Will rewrite it to simply just return a buffer, and only wrap 
>>>>>>>>> it in the Resource class getByteBuffer.
>>>>>>>>>
>>>>>>>>> What would be your thought on updating the 
>>>>>>>>> ZipFile.getInputStream to return ByteArrayInputStream for 
>>>>>>>>> small entries? Currently I do that work outside in two places 
>>>>>>>>> and moving it would potentially speed up others reading small 
>>>>>>>>> entries as well.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Staffan
>>>>>>>> Just realized that my use of ByteArrayInputStream would miss 
>>>>>>>> Jar verification if enabled so the way to go hear would be to 
>>>>>>>> add it if possible to the ZipFile.getInputStream.
>>>>>>>>
>>>>>>>> //Staffan
>>>>>>> Hi,
>>>>>>>
>>>>>>> Here is an updated webrev which uses a byte[] directly and also 
>>>>>>> uses ByteArrayInputStream in ZipFile for small entries below 128k.
>>>>>>>
>>>>>>
>>>>>> I'm not sure about the benefit of doing the ByteArrayInputStream 
>>>>>> in ZipFile.getInputStream. It has
>>>>>> the consequence of changing the "expected" behavior of 
>>>>>> getInputStream() (instead of return an
>>>>>> input stream waiting for reading, it now reads all bytes in 
>>>>>> advance), something we might not want
>>>>>> to do in a performance tuning. Though it might be reasonable to 
>>>>>> guess everyone get an input stream
>>>>>> is to read all bytes from it later.
>>>>>>
>>>>>> -Sherman
>>>>>>
>>>>>>
>>>>>>> http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.1
>>>>>>>
>>>>>>> //Staffan
>>>>>>
>>>>> Agree that it will change the behavior slightly, but as you said 
>>>>> it is probably expected that some one will read the stream 
>>>>> eventually.
>>>>> We could reduce the size further if that makes a difference, if 
>>>>> the size is below 65k we would not use more memory than the buffer 
>>>>> allocated for the InflaterStream today.
>>>>> The total allocation would be slightly larger for deflated entries 
>>>>> as we would allocate a byte[] for the compressed bytes, but it 
>>>>> would be GC:able and not kept alive. So from a memory perspective 
>>>>> the difference is very limited.
>>>>>
>>>>> //Staffan
>>>> Hi,
>>>>
>>>> Bumping this thread to get some more comments on the concern about 
>>>> changing the ZipFile.getInputStream behavior. The benefit of doing 
>>>> this change is that any read of small entries from ZIP and JAR 
>>>> files will be much faster and less resources will be held, 
>>>> including native resources normally held by the ZipInputStream.
>>>>
>>>> The behavior change that will occur is that the full entry will be 
>>>> read as part of creating the stream and not lazily as might be 
>>>> expected. However when getting a today InputStream zip file will be 
>>>> accessed to read information about the size of the entry, so the 
>>>> zip file is already touched when getting an InputStream, but not 
>>>> the compressed bytes.
>>>>
>>>> I'm fine with removing this part of the change and just push the 
>>>> private getBytes function and the updates to the JDK libraries to 
>>>> use it.
>>>>
>>>> Thanks,
>>>> Staffan
>>>
>>
>