RFR(XS) 8249719: MethodHandle performance suffers from bad ResolvedMethodTable hash function

Mon Jul 27 21:08:01 UTC 2020

+1.

On 7/27/20 4:37 PM, Thomas Stüfe wrote:
> I’m fine with that version.
>
> Thanks, Thomas
>
> On Mon 27. Jul 2020 at 22:30, Andrei Pangin <andrei.pangin at gmail.com 
> <mailto:andrei.pangin at gmail.com>> wrote:
>
>     Thank you for the ideas. Checking the real bucket length looks
>     attractive, but as far as I see, it does not always work. First,
>     because the table expansion is triggered only after GC. Second,
>     because it's still possible to intentionally generate hash
>     collisions, but I believe the JVM should not crash even in the
>     worst case scenario.
>

yes, I agree.  It might not be safe.

Making this manual is fine.
Thanks for fixing this.
Coleen
>
>
>     Printing the hash value and parsing the logs looks a bit too
>     artificial to me, while my original intention was to test the
>     performance (or algorithmic complexity) of the code rather than
>     the particular hash code algorithm.
>
>     Anyway, if you all don't mind making the test manual, let me just
>     stick with this option - not to make the test much more
>     complicated than the fix itself :) especially if the table will be
>     removed some time soon.
>
>     Here's the updated webrev, just in case. It differs only in
>     changing "timeout=300" to "manual".
>     http://cr.openjdk.java.net/~apangin/8249719/webrev.2/
>
>     Thanks again,
>     Andrei
>
>
>     пн, 27 июл. 2020 г. в 22:12, <coleen.phillimore at oracle.com
>     <mailto:coleen.phillimore at oracle.com>>:
>
>
>
>         On 7/27/20 2:59 PM, Thomas Stüfe wrote:
>>         A manual test may be enough too, I mean how probable is it
>>         that this particular error would ever resurface. Better would
>>         be, maybe, to just add an assert into the insert path which
>>         fires if a bucket length is above a certain threshold. That
>>         would catch all kinds of bad hashes, not just this one case.
>>
>>         BTW maybe we should add a jcmd option to print this table,
>>         like we have for the symbol- and stringtable.
>
>         Erik Osterlund removes this table with the New Invoke Bindings
>         JEP so it's not worth adding more to this right now.
>         thanks,
>         Coleen
>>
>>         Cheers Thomas
>>
>>         On Mon, Jul 27, 2020 at 8:44 PM <coleen.phillimore at oracle.com
>>         <mailto:coleen.phillimore at oracle.com>> wrote:
>>
>>
>>
>>             On 7/27/20 2:33 PM, Volker Simonis wrote:
>>>             I like this solution although it introduces a hard
>>>             dependency on the exact log format. But that's probably
>>>             not a big issue.
>>>
>>>             Another possibility would probably be to add a gtest.
>>>
>>>             But I'd also be happy to make the current test a manual
>>>             test.
>>>
>>>             Let's see what Coleen's opinion is. We can use her as a
>>>             tiebreaker :)
>>>
>>>             Otherwise I'll leave it up to Andrei to choose his
>>>             preferred option :)
>>
>>             So what I suggested is risky (I'm curious to see if it
>>             works).  My second choice is to make it a manual test. 
>>             Hopefully we won't be changing up the hashcode all the
>>             time and risk breaking this again.
>>             thanks,
>>             Coleen
>>>
>>>             On Mon, Jul 27, 2020 at 8:04 PM Thomas Stüfe
>>>             <thomas.stuefe at gmail.com
>>>             <mailto:thomas.stuefe at gmail.com>> wrote:
>>>
>>>                 Hi Andrei,
>>>
>>>                 I was afraid this might happen, but only occurred to
>>>                 me after the review.
>>>
>>>                 My proposal would be:
>>>
>>>                 - in resolvedMethodTable.cpp, extend the log message
>>>                 in log_insert() and print out the hashcode too (e.g.
>>>                 "ResolvedMethod entry added for ... (4711)").
>>>
>>>                 - In the test, start your test as a sub process,
>>>                 give it -Xlog:membername+table=debug, and scan the
>>>                 output line for "ResolvedMethod entry added for
>>>                 <your class name> (<hash>)". Extract the hash. Read
>>>                 two or three lines and compare the hash.
>>>
>>>                 The advantage would be that you do not need to load
>>>                 200k classes to see that your regression test works,
>>>                 and it is timing independent. For examples of sub
>>>                 process spawning and output scanning, there are many
>>>                 jtreg tests already (e.g. runtime/ErrorHandling has
>>>                 a few).
>>>
>>>                 Just my 5 cent. Maybe the others have different
>>>                 proposals.
>>>
>>>                 Cheers, Thomas
>>>
>>>
>>>
>>>
>>>                 On Mon, Jul 27, 2020 at 7:52 PM Andrei Pangin
>>>                 <andrei.pangin at gmail.com
>>>                 <mailto:andrei.pangin at gmail.com>> wrote:
>>>
>>>                     Hi,
>>>
>>>                     Coleen, Thomas, thank you for your reviews. It
>>>                     turns out though that my new test times out on
>>>                     debug builds (thanks Volker for double-checking).
>>>
>>>                     Test 	Tier 	Platform 	Description
>>>                     runtime/MemberName/ResolvedMethodTableHash.java
>>>                     tier1 	linux-aarch64-debug 	Error: Program
>>>                     `...java' timed out (timeout set to ...ms,
>>>                     elapsed time including timeout handling was ...ms).
>>>                     runtime/MemberName/ResolvedMethodTableHash.java
>>>                     tier1 	windows-x64-debug 	Error: Program
>>>                     `c\:\\ade\\mesos\\work_dir\\jib-master\\install\\...-07-27-0826248.andrei.pangin.source\\windows-x64-debug.jdk\\jdk-16\\fastdebug\\bin\\java'
>>>                     timed out (timeout set to ...ms, elapsed time
>>>                     including timeout handling was ...ms).
>>>                     runtime/MemberName/ResolvedMethodTableHash.java
>>>                     tier1 	macosx-x64-debug 	Error: Program
>>>                     `...java' timed out (timeout set to ...ms,
>>>                     elapsed time including timeout handling was ...ms).
>>>
>>>
>>>                     Apparently, we can't rely on timing, since this
>>>                     test runs two orders of magnitude slower on
>>>                     debug JVM than on product one. I could simply
>>>                     mark the test as manual, or is there a better
>>>                     idea? Maybe, adjust the parameters and run the
>>>                     test automatically only on product or only on
>>>                     debug JVM? What do you think?
>>>
>>>                     Regards,
>>>                     Andrei
>>>
>>>                     сб, 25 июл. 2020 г. в 17:34,
>>>                     <coleen.phillimore at oracle.com
>>>                     <mailto:coleen.phillimore at oracle.com>>:
>>>
>>>
>>>                         Hi Andrei,
>>>                         This looks good.  Thank you for finding this
>>>                         bug. And thanks to Volker for sponsoring it
>>>                         as well.
>>>                         Nice to see you on the list, Andrei!
>>>                         Coleen
>>>
>>>                         On 7/25/20 3:18 AM, Thomas Stüfe wrote:
>>>>                         Hi, Andrei,
>>>>
>>>>                         Good find. I played around with a test of
>>>>                         generating lots of lambdas and yes, all the
>>>>                         hashes are equal. With your patch
>>>>                         invocation time went down by half (that was
>>>>                         for 10000 lambdas).
>>>>
>>>>                         The test looks fine though the normal way
>>>>                         to do this seems to be jcod. I personally
>>>>                         don't care since the test is nice and self
>>>>                         contained that way, but someone from the
>>>>                         Oracle runtime group should confirm this is
>>>>                         fine (ccing Coleen).
>>>>
>>>>                         JDK11 seems to be affected too.
>>>>
>>>>                         This probably also affects jruby.
>>>>
>>>>                         +1 from me.
>>>>
>>>>                         ..Thomas
>>>>
>>>>                         On Fri, Jul 24, 2020 at 4:53 PM Andrei
>>>>                         Pangin <andrei.pangin at gmail.com
>>>>                         <mailto:andrei.pangin at gmail.com>> wrote:
>>>>
>>>>                             Hi,
>>>>
>>>>                             Please review a small fix to a
>>>>                             not-so-small performance issue that we've
>>>>                             seen when migrating a production
>>>>                             application from JDK 8 to JDK 14.
>>>>
>>>>                             On certain workloads, where Nashorn
>>>>                             produces thousands MethodHandles,
>>>>                             ResolvedMethodTable operations become
>>>>                             extremely slow due to degenerate
>>>>                             hashcode. This patch basically fixes
>>>>                             hashcode by including the method
>>>>                             holder's name in the computation. More
>>>>                             details in the bug report.
>>>>
>>>>                             CR:
>>>>                             https://bugs.openjdk.java.net/browse/JDK-8249719
>>>>                             Webrev:
>>>>                             https://cr.openjdk.java.net/~apangin/8249719/webrev/
>>>>
>>>>                             Tested: tier1-2, hotspot*runtime
>>>>
>>>>                             I'll be glad if someone could sponsor
>>>>                             the patch.
>>>>
>>>>                             Thank you,
>>>>                             Andrei Pangin
>>>>
>>>
>>
>