RFR 8080640: Reduce copying when reading JAR/ZIP entries

Tue Jun 23 19:47:59 UTC 2015

Hi Sherman,

Removed the check, http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.4

Cheers,
Staffan

On 06/23/2015 11:22 AM, Xueming Shen wrote:
> Hi Staffan,
>
> #527 check is probably unnecessary. The size and csize are 32-bit 
> unsigned integer, they
> should never be < 0.
>
> The rest looks good.
>
> Thanks,
> -Sherman
>
> On 06/23/2015 10:54 AM, Staffan Friberg wrote:
>> Hi Sherman,
>>
>> Thanks for the review. I removed the unused import and the changes to 
>> reduce the lock region.
>>
>> New webrev, http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.3
>>
>> Thanks,
>> Staffan
>>
>>
>> On 06/08/2015 02:37 PM, Xueming Shen wrote:
>>> Staffan,
>>>
>>> (1) ByteArrayInputSteram is no longer needed in ZipFile
>>> (2) You've changed the lock region in ZipFile.getInputSteram. Given 
>>> we are not
>>>      doing ByteArrayInpusteram for this method, can we just not 
>>> touch this method
>>>      and the class ZipFileInputSteram()?
>>>      The concern is that we did some changes in that area back to 
>>> 2010 and triggered
>>>      a complicated race condition regression [1], it was finally 
>>> fixed after lot of rounds
>>>      of discussion. I still have all those emails in my inbox. It 
>>> would be better to keep
>>>      whatever works for now, instead of re-fresh all the memory 
>>> (read all those emails)
>>>      to figure out if the latest change might have a negative impact.
>>>
>>> The getBytes() implementation looks good to me.
>>>
>>> Thanks!
>>> -Sherman
>>>
>>> [1] 
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-March/006355.html
>>>
>>> On 06/05/2015 11:09 AM, Staffan Friberg wrote:
>>>> Hi Sherman,
>>>>
>>>> I have a new webrev which reverts that part, 
>>>> http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.2/
>>>>
>>>> Summary of changes
>>>>     Reduce lock region in ZipFile.getInputstream
>>>>     Add private ZipFile.getBytes that can be used in select places 
>>>> in the JDK where all bytes will be read
>>>>
>>>> Could you sponsor this change once it has been reviewed?
>>>>
>>>> Thanks,
>>>> Staffan
>>>>
>>>> On 06/03/2015 10:45 AM, Xueming Shen wrote:
>>>>> Staffan,
>>>>>
>>>>> I'm not convinced that the benefit here is significant enough to 
>>>>> change the
>>>>> getInputStream() to return a ByteArrayInputStream, given this can 
>>>>> be easily
>>>>> achieved by wrapping the returned byte[] from getBytes(ZipEntry) 
>>>>> at user's
>>>>> site. I would suggest to file a separate rfe on this disagreement 
>>>>> and move on
>>>>> with the agreed getBytes() for now.
>>>>>
>>>>> Thanks,
>>>>> -Sherman
>>>>>
>>>>> On 06/02/2015 10:27 AM, Staffan Friberg wrote:
>>>>>>
>>>>>> On 05/22/2015 01:15 PM, Staffan Friberg wrote:
>>>>>>> On 05/22/2015 11:51 AM, Xueming Shen wrote:
>>>>>>>> On 05/22/2015 11:41 AM, Staffan Friberg wrote:
>>>>>>>>>
>>>>>>>>> On 05/21/2015 11:00 AM, Staffan Friberg wrote:
>>>>>>>>>>
>>>>>>>>>> On 05/21/2015 09:48 AM, Staffan Friberg wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 05/20/2015 10:57 AM, Xueming Shen wrote:
>>>>>>>>>>>> On 05/18/2015 06:44 PM, Staffan Friberg wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Wanted to get reviews and feedback on this performance 
>>>>>>>>>>>>> improvement for reading from JAR/ZIP files during 
>>>>>>>>>>>>> classloading by reducing unnecessary copying and reading 
>>>>>>>>>>>>> the entry in one go instead of in small portions. This 
>>>>>>>>>>>>> shows a significant improvement when reading a single 
>>>>>>>>>>>>> entry and for a large application with 10k classes and 
>>>>>>>>>>>>> 500+ JAR files it improved the startup time by 4%.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For more details on the background and performance results 
>>>>>>>>>>>>> please see the RFE entry.
>>>>>>>>>>>>>
>>>>>>>>>>>>> RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
>>>>>>>>>>>>> WEBREV - 
>>>>>>>>>>>>> http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Staffan
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Staffan,
>>>>>>>>>>>>
>>>>>>>>>>>> If I did not miss something here, from your use scenario it 
>>>>>>>>>>>> appears to me the only thing you really
>>>>>>>>>>>> need here to help boost your performance is
>>>>>>>>>>>>
>>>>>>>>>>>>     byte[] ZipFile.getAllBytes(ZipEntry ze);
>>>>>>>>>>>>
>>>>>>>>>>>> You are allocating a byte[] at use side and wrapping it 
>>>>>>>>>>>> with a ByteBuffer if the size is small enough,
>>>>>>>>>>>> otherwise, you letting the ZipFile to allocate a big enough 
>>>>>>>>>>>> one for you. It does not look like you
>>>>>>>>>>>> can re-use that byte[] (has to be wrapped by the 
>>>>>>>>>>>> ByteArrayInputStream and return), why do you
>>>>>>>>>>>> need two different methods here? The logic would be much 
>>>>>>>>>>>> easier to simply let the ZipFile to allocate
>>>>>>>>>>>> the needed buffer with appropriate size, fill the bytes and 
>>>>>>>>>>>> return, with a "OOME" if the entry size
>>>>>>>>>>>> is bigger than 2g.
>>>>>>>>>>>>
>>>>>>>>>>>> The only thing we use from the input ze is its name, get 
>>>>>>>>>>>> the size/csize from the jzentry, I don't think
>>>>>>>>>>>> jzentry.csize/size can be "unknown", they are from the 
>>>>>>>>>>>> "cen" table.
>>>>>>>>>>>>
>>>>>>>>>>>> If the real/final use of the bytes is to wrap it with a 
>>>>>>>>>>>> ByteArrayInputStream,why bother using ByteBuffer
>>>>>>>>>>>> here? Shouldn't a direct byte[] with exactly the size of 
>>>>>>>>>>>> the entry server better.
>>>>>>>>>>>>
>>>>>>>>>>>> -Sherman
>>>>>>>>>>>>
>>>>>>>>>>> Hi Sherman,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the comments. I agree, was starting out with 
>>>>>>>>>>> bytebuffer because I was hoping to be able to cache things 
>>>>>>>>>>> where the buffer was being used, but since the buffer is 
>>>>>>>>>>> past along further I couldn't figure out a clean way to do it.
>>>>>>>>>>> Will rewrite it to simply just return a buffer, and only 
>>>>>>>>>>> wrap it in the Resource class getByteBuffer.
>>>>>>>>>>>
>>>>>>>>>>> What would be your thought on updating the 
>>>>>>>>>>> ZipFile.getInputStream to return ByteArrayInputStream for 
>>>>>>>>>>> small entries? Currently I do that work outside in two 
>>>>>>>>>>> places and moving it would potentially speed up others 
>>>>>>>>>>> reading small entries as well.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Staffan
>>>>>>>>>> Just realized that my use of ByteArrayInputStream would miss 
>>>>>>>>>> Jar verification if enabled so the way to go hear would be to 
>>>>>>>>>> add it if possible to the ZipFile.getInputStream.
>>>>>>>>>>
>>>>>>>>>> //Staffan
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Here is an updated webrev which uses a byte[] directly and 
>>>>>>>>> also uses ByteArrayInputStream in ZipFile for small entries 
>>>>>>>>> below 128k.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not sure about the benefit of doing the 
>>>>>>>> ByteArrayInputStream in ZipFile.getInputStream. It has
>>>>>>>> the consequence of changing the "expected" behavior of 
>>>>>>>> getInputStream() (instead of return an
>>>>>>>> input stream waiting for reading, it now reads all bytes in 
>>>>>>>> advance), something we might not want
>>>>>>>> to do in a performance tuning. Though it might be reasonable to 
>>>>>>>> guess everyone get an input stream
>>>>>>>> is to read all bytes from it later.
>>>>>>>>
>>>>>>>> -Sherman
>>>>>>>>
>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.1
>>>>>>>>>
>>>>>>>>> //Staffan
>>>>>>>>
>>>>>>> Agree that it will change the behavior slightly, but as you said 
>>>>>>> it is probably expected that some one will read the stream 
>>>>>>> eventually.
>>>>>>> We could reduce the size further if that makes a difference, if 
>>>>>>> the size is below 65k we would not use more memory than the 
>>>>>>> buffer allocated for the InflaterStream today.
>>>>>>> The total allocation would be slightly larger for deflated 
>>>>>>> entries as we would allocate a byte[] for the compressed bytes, 
>>>>>>> but it would be GC:able and not kept alive. So from a memory 
>>>>>>> perspective the difference is very limited.
>>>>>>>
>>>>>>> //Staffan
>>>>>> Hi,
>>>>>>
>>>>>> Bumping this thread to get some more comments on the concern 
>>>>>> about changing the ZipFile.getInputStream behavior. The benefit 
>>>>>> of doing this change is that any read of small entries from ZIP 
>>>>>> and JAR files will be much faster and less resources will be 
>>>>>> held, including native resources normally held by the 
>>>>>> ZipInputStream.
>>>>>>
>>>>>> The behavior change that will occur is that the full entry will 
>>>>>> be read as part of creating the stream and not lazily as might be 
>>>>>> expected. However when getting a today InputStream zip file will 
>>>>>> be accessed to read information about the size of the entry, so 
>>>>>> the zip file is already touched when getting an InputStream, but 
>>>>>> not the compressed bytes.
>>>>>>
>>>>>> I'm fine with removing this part of the change and just push the 
>>>>>> private getBytes function and the updates to the JDK libraries to 
>>>>>> use it.
>>>>>>
>>>>>> Thanks,
>>>>>> Staffan
>>>>>
>>>>
>>>
>>
>