RFR: 8279619: [JVMCI] improve EncodedSpeculationReason

Mon Feb 20 18:12:26 UTC 2023

On Fri, 17 Feb 2023 08:22:52 GMT, Doug Simon <dnsimon at openjdk.org> wrote:

>> src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.meta/src/jdk/vm/ci/meta/EncodedSpeculationReason.java line 64:
>> 
>>> 62:             encoding = encodingSupplier.get();
>>> 63:             encoding.addInt(groupId);
>>> 64:             encoding.addInt(groupName.hashCode());
>> 
>> Why aren't we adding the whole String?  I think we should either add the whole string or update the javadoc to document the current implementation, where the groupName is simply a nice name provided for printing and it's up to the user to provide a stable definitions of group ids.
>
> Adding the whole string won't change much as it is hashed anyway as part of the digest being generated by `encoding`. Adding the whole string also increases the average size of the array returned by `jdk.vm.ci.hotspot.HotSpotSpeculationEncoding.getByteArray()`.

Then I would simply document that only the id is emitted and the creator of the encoding is responsible for any extra disambiguation if the id isn't sufficient.  Or we could skip this PR completely since our strategy for handing out ids protects us from this problem.

I've never been a fan of the hash strategy used by these encodings.  I have no sense of how likely collisions are and how well SHA-1 performs when hashing the very short chunks of data that we're actually feeding them.  I think most of the time the actual data is shorter than the size of a SHA-1 digest.  The most common speculations are id,method,bci which should only be 16 bytes.  If we ever do a redesign of speculations I think it would be useful if they could be printed by HotSpot so that LogCompilation could report them.  Being able to decode failed speculations from the Graal side would be useful as well.

-------------

PR: https://git.openjdk.org/jdk/pull/12532