RFR 8080640: Reduce copying when reading JAR/ZIP entries
Staffan Friberg
staffan.friberg at oracle.com
Tue Jun 23 19:47:59 UTC 2015
Hi Sherman,
Removed the check, http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.4
Cheers,
Staffan
On 06/23/2015 11:22 AM, Xueming Shen wrote:
> Hi Staffan,
>
> #527 check is probably unnecessary. The size and csize are 32-bit
> unsigned integer, they
> should never be < 0.
>
> The rest looks good.
>
> Thanks,
> -Sherman
>
> On 06/23/2015 10:54 AM, Staffan Friberg wrote:
>> Hi Sherman,
>>
>> Thanks for the review. I removed the unused import and the changes to
>> reduce the lock region.
>>
>> New webrev, http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.3
>>
>> Thanks,
>> Staffan
>>
>>
>> On 06/08/2015 02:37 PM, Xueming Shen wrote:
>>> Staffan,
>>>
>>> (1) ByteArrayInputSteram is no longer needed in ZipFile
>>> (2) You've changed the lock region in ZipFile.getInputSteram. Given
>>> we are not
>>> doing ByteArrayInpusteram for this method, can we just not
>>> touch this method
>>> and the class ZipFileInputSteram()?
>>> The concern is that we did some changes in that area back to
>>> 2010 and triggered
>>> a complicated race condition regression [1], it was finally
>>> fixed after lot of rounds
>>> of discussion. I still have all those emails in my inbox. It
>>> would be better to keep
>>> whatever works for now, instead of re-fresh all the memory
>>> (read all those emails)
>>> to figure out if the latest change might have a negative impact.
>>>
>>> The getBytes() implementation looks good to me.
>>>
>>> Thanks!
>>> -Sherman
>>>
>>> [1]
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-March/006355.html
>>>
>>> On 06/05/2015 11:09 AM, Staffan Friberg wrote:
>>>> Hi Sherman,
>>>>
>>>> I have a new webrev which reverts that part,
>>>> http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.2/
>>>>
>>>> Summary of changes
>>>> Reduce lock region in ZipFile.getInputstream
>>>> Add private ZipFile.getBytes that can be used in select places
>>>> in the JDK where all bytes will be read
>>>>
>>>> Could you sponsor this change once it has been reviewed?
>>>>
>>>> Thanks,
>>>> Staffan
>>>>
>>>> On 06/03/2015 10:45 AM, Xueming Shen wrote:
>>>>> Staffan,
>>>>>
>>>>> I'm not convinced that the benefit here is significant enough to
>>>>> change the
>>>>> getInputStream() to return a ByteArrayInputStream, given this can
>>>>> be easily
>>>>> achieved by wrapping the returned byte[] from getBytes(ZipEntry)
>>>>> at user's
>>>>> site. I would suggest to file a separate rfe on this disagreement
>>>>> and move on
>>>>> with the agreed getBytes() for now.
>>>>>
>>>>> Thanks,
>>>>> -Sherman
>>>>>
>>>>> On 06/02/2015 10:27 AM, Staffan Friberg wrote:
>>>>>>
>>>>>> On 05/22/2015 01:15 PM, Staffan Friberg wrote:
>>>>>>> On 05/22/2015 11:51 AM, Xueming Shen wrote:
>>>>>>>> On 05/22/2015 11:41 AM, Staffan Friberg wrote:
>>>>>>>>>
>>>>>>>>> On 05/21/2015 11:00 AM, Staffan Friberg wrote:
>>>>>>>>>>
>>>>>>>>>> On 05/21/2015 09:48 AM, Staffan Friberg wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 05/20/2015 10:57 AM, Xueming Shen wrote:
>>>>>>>>>>>> On 05/18/2015 06:44 PM, Staffan Friberg wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Wanted to get reviews and feedback on this performance
>>>>>>>>>>>>> improvement for reading from JAR/ZIP files during
>>>>>>>>>>>>> classloading by reducing unnecessary copying and reading
>>>>>>>>>>>>> the entry in one go instead of in small portions. This
>>>>>>>>>>>>> shows a significant improvement when reading a single
>>>>>>>>>>>>> entry and for a large application with 10k classes and
>>>>>>>>>>>>> 500+ JAR files it improved the startup time by 4%.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For more details on the background and performance results
>>>>>>>>>>>>> please see the RFE entry.
>>>>>>>>>>>>>
>>>>>>>>>>>>> RFE - https://bugs.openjdk.java.net/browse/JDK-8080640
>>>>>>>>>>>>> WEBREV -
>>>>>>>>>>>>> http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.0
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Staffan
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Staffan,
>>>>>>>>>>>>
>>>>>>>>>>>> If I did not miss something here, from your use scenario it
>>>>>>>>>>>> appears to me the only thing you really
>>>>>>>>>>>> need here to help boost your performance is
>>>>>>>>>>>>
>>>>>>>>>>>> byte[] ZipFile.getAllBytes(ZipEntry ze);
>>>>>>>>>>>>
>>>>>>>>>>>> You are allocating a byte[] at use side and wrapping it
>>>>>>>>>>>> with a ByteBuffer if the size is small enough,
>>>>>>>>>>>> otherwise, you letting the ZipFile to allocate a big enough
>>>>>>>>>>>> one for you. It does not look like you
>>>>>>>>>>>> can re-use that byte[] (has to be wrapped by the
>>>>>>>>>>>> ByteArrayInputStream and return), why do you
>>>>>>>>>>>> need two different methods here? The logic would be much
>>>>>>>>>>>> easier to simply let the ZipFile to allocate
>>>>>>>>>>>> the needed buffer with appropriate size, fill the bytes and
>>>>>>>>>>>> return, with a "OOME" if the entry size
>>>>>>>>>>>> is bigger than 2g.
>>>>>>>>>>>>
>>>>>>>>>>>> The only thing we use from the input ze is its name, get
>>>>>>>>>>>> the size/csize from the jzentry, I don't think
>>>>>>>>>>>> jzentry.csize/size can be "unknown", they are from the
>>>>>>>>>>>> "cen" table.
>>>>>>>>>>>>
>>>>>>>>>>>> If the real/final use of the bytes is to wrap it with a
>>>>>>>>>>>> ByteArrayInputStream,why bother using ByteBuffer
>>>>>>>>>>>> here? Shouldn't a direct byte[] with exactly the size of
>>>>>>>>>>>> the entry server better.
>>>>>>>>>>>>
>>>>>>>>>>>> -Sherman
>>>>>>>>>>>>
>>>>>>>>>>> Hi Sherman,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the comments. I agree, was starting out with
>>>>>>>>>>> bytebuffer because I was hoping to be able to cache things
>>>>>>>>>>> where the buffer was being used, but since the buffer is
>>>>>>>>>>> past along further I couldn't figure out a clean way to do it.
>>>>>>>>>>> Will rewrite it to simply just return a buffer, and only
>>>>>>>>>>> wrap it in the Resource class getByteBuffer.
>>>>>>>>>>>
>>>>>>>>>>> What would be your thought on updating the
>>>>>>>>>>> ZipFile.getInputStream to return ByteArrayInputStream for
>>>>>>>>>>> small entries? Currently I do that work outside in two
>>>>>>>>>>> places and moving it would potentially speed up others
>>>>>>>>>>> reading small entries as well.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Staffan
>>>>>>>>>> Just realized that my use of ByteArrayInputStream would miss
>>>>>>>>>> Jar verification if enabled so the way to go hear would be to
>>>>>>>>>> add it if possible to the ZipFile.getInputStream.
>>>>>>>>>>
>>>>>>>>>> //Staffan
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Here is an updated webrev which uses a byte[] directly and
>>>>>>>>> also uses ByteArrayInputStream in ZipFile for small entries
>>>>>>>>> below 128k.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not sure about the benefit of doing the
>>>>>>>> ByteArrayInputStream in ZipFile.getInputStream. It has
>>>>>>>> the consequence of changing the "expected" behavior of
>>>>>>>> getInputStream() (instead of return an
>>>>>>>> input stream waiting for reading, it now reads all bytes in
>>>>>>>> advance), something we might not want
>>>>>>>> to do in a performance tuning. Though it might be reasonable to
>>>>>>>> guess everyone get an input stream
>>>>>>>> is to read all bytes from it later.
>>>>>>>>
>>>>>>>> -Sherman
>>>>>>>>
>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~sfriberg/JDK-8080640/webrev.1
>>>>>>>>>
>>>>>>>>> //Staffan
>>>>>>>>
>>>>>>> Agree that it will change the behavior slightly, but as you said
>>>>>>> it is probably expected that some one will read the stream
>>>>>>> eventually.
>>>>>>> We could reduce the size further if that makes a difference, if
>>>>>>> the size is below 65k we would not use more memory than the
>>>>>>> buffer allocated for the InflaterStream today.
>>>>>>> The total allocation would be slightly larger for deflated
>>>>>>> entries as we would allocate a byte[] for the compressed bytes,
>>>>>>> but it would be GC:able and not kept alive. So from a memory
>>>>>>> perspective the difference is very limited.
>>>>>>>
>>>>>>> //Staffan
>>>>>> Hi,
>>>>>>
>>>>>> Bumping this thread to get some more comments on the concern
>>>>>> about changing the ZipFile.getInputStream behavior. The benefit
>>>>>> of doing this change is that any read of small entries from ZIP
>>>>>> and JAR files will be much faster and less resources will be
>>>>>> held, including native resources normally held by the
>>>>>> ZipInputStream.
>>>>>>
>>>>>> The behavior change that will occur is that the full entry will
>>>>>> be read as part of creating the stream and not lazily as might be
>>>>>> expected. However when getting a today InputStream zip file will
>>>>>> be accessed to read information about the size of the entry, so
>>>>>> the zip file is already touched when getting an InputStream, but
>>>>>> not the compressed bytes.
>>>>>>
>>>>>> I'm fine with removing this part of the change and just push the
>>>>>> private getBytes function and the updates to the JDK libraries to
>>>>>> use it.
>>>>>>
>>>>>> Thanks,
>>>>>> Staffan
>>>>>
>>>>
>>>
>>
>
More information about the core-libs-dev
mailing list