Initial webrev with changes for JDK 9 - jimage

Jim Laskey (Oracle) james.laskey at oracle.com
Mon Mar 14 16:28:09 UTC 2016


These are the changes I plan on submitting for M3.

http://cr.openjdk.java.net/~jlaskey/jimagereview/webrev/index.html

I’ve filed the following bugs as follow up.

https://bugs.openjdk.java.net/browse/JDK-8151806 JImage decompress code needs to be revised to be more effective
https://bugs.openjdk.java.net/browse/JDK-8151807 ImageBufferCache should use sun.nio.ch.Util

— Jim



> On Mar 10, 2016, at 7:22 PM, Jim Laskey (Oracle) <james.laskey at oracle.com> wrote:
> 
> Thank you Roger.  Comments inline.  In general, I will hold off changes until after merge so as not to destabilize.  Note that ImageBufferCache and Decompression are not currently used.
> 
>> On Mar 8, 2016, at 2:07 PM, Roger Riggs <Roger.Riggs at Oracle.com> wrote:
>> 
>> Hi,
>> 
>> Review comments for the jimage code in java.base, mostly cleanup but perhaps a bug or two.
>> General:  inconsistent style and not following guidelines, copyright dates may need an update.
>> Use of assert for error checking is not going to catch or help isolate errors in production builds.
> 
> Noted.
> 
>> 
>> dir: jdk/src/java.base/share/classes/jdk/internal/jimage/
>> 
>> 
>> BasicImageReader:
>> - readHeader() exception message should say not a jimage or wrong version
>> Other IOExceptions can occur if the file is corrupted or is too small, array index out of bounds, etc.
> 
> Replaced with specific exceptions.
> 
>> 
>> - findLocation() should use a more explicit computation for the redirection
>>  than rehashing; this would allow the hash seed to be encapsulated.
> 
> As part of the review change set, I've documented the Minimal Perfect Hash Method (see PerfectHashBuilder.)  The algorithm requires a one time one rehash when there is a collision.  A new seed forces a different distribution of the colliding strings.  If you started with the existing hashCode, you would not get a new distribution, merely a shift of all the colliding elements. Random distribution is the essence of algorithm.
> 
>> 
>> - slice() synchronizes on the ByteBuffer; does every other access?
> 
> Byte buffer use here is immutable, however slicing of the ByteBuffer does require modifying position and limit.  Wrapping or cloning the buffers would be of little benefit and would take a performance hit.  The ByteBuffer API might have been well served with an atomic non-modifying slice operation.
> 
>> 
>> - getAllLocations() is unused and does not respect its sorted argument;
>>  Best to remove the method for now.
> 
> Removed.
> 
>> 
>> - getResourceBuffer(): unused - remove
> 
> Removed.
> 
>> 
>> - getResourceStream(): unused - remove
> 
> Removed.
> 
>> 
>> ImageStringsReader:
>> 
>> - Is Modified UTF-8 really needed here;  using normal UTF-8 would allow
>>  re-use and better performance.  Alternatively, Modified UTF-8 should be
>>  added as a supported encoding.
> 
> Modified UTF-8 is used to be more compact and to simplify use in native code.  The jimage string table has overlapping use with constant pool compaction so is really an older revision of the Modified UTF-8 standard.  Not sure why Modifed UTF-8 is not offered as an encoding.  It would have made constant pool management easier. And note that relying on a particular encoding is problematic.  Encoding standards do change, affecting the stability of the hash algorithm and may prevent the reading of older jimage files.
> 
>> 
>> - The ImageStringsReader.hashCode() should be defined so it can take advantage of 32 or 64 bit accesses; since it is used frequently.  And it needs to be stable across revisions that may be accessing jimages from different JDK versions.
>> 
> 
> Reasonable and will consider.  Not sure what you mean by stable across revisions. This is not string hashCode.  This is a hashCode that is specific to MPHM and is deterministic. 
> 
>> - stringFromMUTF8(byte[]) - there may be some optimization to avoid expanding ascii to chars and then re-encoding in String.
> 
> Will touch base with the Aleksey.
> 
>> 
>> - charsFromByteBufferLength/charsFromByteBuffer should throw an exception if no terminating null, not just when assertions are enabled
> 
> Will throw InternalError instead.
> 
>> 
>> - inconsistent assert errors vs exceptions for malformed mutf8.
>>  Use of assert will not detect errors in production builds
> 
> Will throw InternalError instead.
> 
>> 
>> - mutf8FromChars allocates a new working buffer (byte[8]) potentially for every UNICODE char.
> 
> Will hoist the buffer allocation and reuse.
> 
>> 
>> - mutf8FromString: should take advantage of new String(bytes, offset, length, charsetName) to avoid extra allocations.
>>  (would need a Modified utf-8 charset).
> 
> Modified UTF-8 not available.
> 
>> 
>> - hashCode(byte[0, seed) should be private; the hashcode algorithm should not be variable.
>> 
> 
> This is not String hashCode. see above.
> 
>> ImageLocation:
>> - Merge ImageLocation functions into ImageLocationBase and rename.
>>   ImageLocation does not have enough unique functionality to be a separate class.
> 
> Merged.
> 
>> 
>> - It is a poor design to create an instance that may not be valid.
>>  It would be better to have an ImageLocation factory method that encapsulated the creation and checking
>> 
> 
> Noted.
> 
>> 
>> ImageLocationBase:
>> - Failing asserts will not cause a runtime error in production.   
>>   But will degenerate into unpredictable other exceptions
> 
> Will throw InternalError instead.
> 
>> 
>> ImageStream:
>> - compress(): Too heavyweight an object to be used and discarded just for compress / decompress of small values.
> 
> compress()?   I don’t see ImageStream used anywhere except for large buffers.  As a note, dynamic ByteBuffers really should be part of the JDK.
> 
>> 
>> 
>> ImageBufferCache:
>> - Comments on each method would make the intent clear
> 
> Will add comments.
> 
>> 
>> - getBuffer: mixing for loop and explicit indexes is fragile
>> - releaseBuffer: check buffer.capacity before threadLocal.get() for early return
>> - Logic of i,j is flawed, i is never changed; j is/becomes 0 
>>   The first buffer is removed if there are more than max
>>   Its not clear what algorithm was intended.
>>   A buffer that is in use can be removed.
> 
> Rewrote using lambdas to clarify the algorithm.
> 
>> 
>> - ImageBufferCache constructor: the isUsed flag should be set in getBuffer()
>>   when it is added to the list (not in the constructor)
>>   It keeps it consistent with when it is set for a re-used buffer.
>> 
> 
> Modified.
> 
>> ImageHeader:
>> - BADMAGIC is not used
> 
> Removed.
> 
>> 
>> - readFrom could throw BufferUnderflowException - undocumented
> 
> get(i) can IndexOutOfBoundsException [and get() BufferUnderflowException]  Will add explicit check.
> 
> 
>> 
>> ImageModuleData:
>> - ImageModuleData seems to be unused;  allModuleNames() is unreferenced
>> 
> 
> Removed.
> 
>> 
>> ImageReader:
>> - ImageReader(path): unused - remove
> 
> Removed.
> 
>> 
>> - Use of assert is non-reassuring, ineffective in production builds
> 
> Added explicit check.
> 
>> 
>> - In a long running application the per thread buffers will get to the point wehere they are unused and should be released.
>>  There is no support for soft or weak references to handle that case.
>>  Alternatively, if the buffers were not per thread then they would have a lesser impact.
> 
> The core libs team was concerned about the performance of class loading.  The pattern used here is the one followed by NIO.  We will probably come up with a shared NIO cache at some point.
> 
>>  Also, why should there be more than one buffer per thread?
> 
> The image cache is only used by 32 bit systems and decompression.  It is possible for a channel read buffer and a decompression buffer to be allocated at the same time.
> 
>> 
>> ImageReader.Resource:
>> - unused create method
> 
> Removed.
> 
>> 
>> - unused parent argument (0) in Resource constructor (dir, loc, attr)
> 
> Removed.
> 
>> 
>> ImageReader.LinkNode:
>> - constructor has unused parent arg
>> 
>> 
> 
> Removed
> 
>> decompress/CompressIndexes:
>> - readInt: wasteful allocation of a single byte array and then another small byte array;
>>  and then another heap buffer allocation in the ByteBuffer.
> 
> Removed all allocation.
> 
>> 
>> - decompress is hugely wasteful of allocations of ByteBuffers of small sizes.
>> 
> 
> Removed all allocation.
> 
> 
>> decompress/CompressedResourceHeader:
>> - MAGIC does not need to be public
>> 
>> 
>> decompress/Decompressor:
>> - IOException "Plugin name not found" should include the missing name
>> 
>> - StringsProvider is an unnecessary interface; could use Supplier<String>
>> 
>> - Redundant public modifiers in interface declarations
>> 
>> decompress/SignatureParser:
>> - Comments describing the input and output formats needed
>> 
>> - style around operators does not consistently include spaces; and "if("...
>> 
>> decompress/StringSharingDecompressor:
>> - Wasteful use of ByteBuffers and small arrays without re-use (safeAdd...)
>> 
>> - StringSharingDecompressor constructor with unused argument - remove
>> 
>> 
>> decompress/ZipDecompressor:
>> - Should be more specific about the zip algorithms supported
>>  since it needs to be stable across releases
>> 
>> - decompress(): use try-with-resources for safer inflation
>> 
> 
> I think decompression needs significant reworking..
> 
>> src/java.base/share/native/libjimage/imageDecompressor.cpp
>> - "// If ptr is not aligned, sparc will fail." implies input argument must be aligned
>> 
>> - General doubt about using assert; for example to check on malloc allocation failure
>> 
>> - In production environment will exhibit as segv
>> 
> 
> Noted.



More information about the jigsaw-dev mailing list