RFR(XS) 8249719: MethodHandle performance suffers from bad ResolvedMethodTable hash function

Mon Jul 27 20:37:38 UTC 2020

I’m fine with that version.

Thanks, Thomas

On Mon 27. Jul 2020 at 22:30, Andrei Pangin <andrei.pangin at gmail.com> wrote:

> Thank you for the ideas. Checking the real bucket length looks attractive,
> but as far as I see, it does not always work. First, because the table
> expansion is triggered only after GC. Second, because it's still possible
> to intentionally generate hash collisions, but I believe the JVM should not
> crash even in the worst case scenario.
>
> Printing the hash value and parsing the logs looks a bit too artificial to
> me, while my original intention was to test the performance (or algorithmic
> complexity) of the code rather than the particular hash code algorithm.
>
> Anyway, if you all don't mind making the test manual, let me just
> stick with this option - not to make the test much more complicated than
> the fix itself :) especially if the table will be removed some time soon.
>
> Here's the updated webrev, just in case. It differs only in changing
> "timeout=300" to "manual".
> http://cr.openjdk.java.net/~apangin/8249719/webrev.2/
>
> Thanks again,
> Andrei
>
>
> пн, 27 июл. 2020 г. в 22:12, <coleen.phillimore at oracle.com>:
>
>>
>>
>> On 7/27/20 2:59 PM, Thomas Stüfe wrote:
>>
>> A manual test may be enough too, I mean how probable is it that this
>> particular error would ever resurface. Better would be, maybe, to just add
>> an assert into the insert path which fires if a bucket length is above a
>> certain threshold. That would catch all kinds of bad hashes, not just this
>> one case.
>>
>> BTW maybe we should add a jcmd option to print this table, like we have
>> for the symbol- and stringtable.
>>
>>
>> Erik Osterlund removes this table with the New Invoke Bindings JEP so
>> it's not worth adding more to this right now.
>> thanks,
>> Coleen
>>
>>
>> Cheers Thomas
>>
>> On Mon, Jul 27, 2020 at 8:44 PM <coleen.phillimore at oracle.com> wrote:
>>
>>>
>>>
>>> On 7/27/20 2:33 PM, Volker Simonis wrote:
>>>
>>> I like this solution although it introduces a hard dependency on the
>>> exact log format. But that's probably not a big issue.
>>>
>>> Another possibility would probably be to add a gtest.
>>>
>>> But I'd also be happy to make the current test a manual test.
>>>
>>> Let's see what Coleen's opinion is. We can use her as a tiebreaker :)
>>>
>>> Otherwise I'll leave it up to Andrei to choose his preferred option :)
>>>
>>>
>>> So what I suggested is risky (I'm curious to see if it works).  My
>>> second choice is to make it a manual test.  Hopefully we won't be changing
>>> up the hashcode all the time and risk breaking this again.
>>> thanks,
>>> Coleen
>>>
>>>
>>>
>>> On Mon, Jul 27, 2020 at 8:04 PM Thomas Stüfe <thomas.stuefe at gmail.com>
>>> wrote:
>>>
>>>> Hi Andrei,
>>>>
>>>> I was afraid this might happen, but only occurred to me after the
>>>> review.
>>>>
>>>> My proposal would be:
>>>>
>>>> - in resolvedMethodTable.cpp, extend the log message in log_insert()
>>>> and print out the hashcode too (e.g. "ResolvedMethod entry added for ...
>>>> (4711)").
>>>>
>>>> - In the test, start your test as a sub process, give it
>>>> -Xlog:membername+table=debug, and scan the output line for "ResolvedMethod
>>>> entry added for <your class name> (<hash>)". Extract the hash. Read two or
>>>> three lines and compare the hash.
>>>>
>>>> The advantage would be that you do not need to load 200k classes to see
>>>> that your regression test works, and it is timing independent. For examples
>>>> of sub process spawning and output scanning, there are many jtreg tests
>>>> already (e.g. runtime/ErrorHandling has a few).
>>>>
>>>> Just my 5 cent. Maybe the others have different proposals.
>>>>
>>>> Cheers, Thomas
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jul 27, 2020 at 7:52 PM Andrei Pangin <andrei.pangin at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Coleen, Thomas, thank you for your reviews. It turns out though that
>>>>> my new test times out on debug builds (thanks Volker for double-checking).
>>>>>
>>>>> Test Tier Platform Description
>>>>> runtime/MemberName/ResolvedMethodTableHash.java tier1
>>>>> linux-aarch64-debug Error: Program `...java' timed out (timeout set
>>>>> to ...ms, elapsed time including timeout handling was ...ms).
>>>>> runtime/MemberName/ResolvedMethodTableHash.java tier1
>>>>> windows-x64-debug Error: Program
>>>>> `c\:\\ade\\mesos\\work_dir\\jib-master\\install\\...-07-27-0826248.andrei.pangin.source\\windows-x64-debug.jdk\\jdk-16\\fastdebug\\bin\\java'
>>>>> timed out (timeout set to ...ms, elapsed time including timeout handling
>>>>> was ...ms).
>>>>> runtime/MemberName/ResolvedMethodTableHash.java tier1 macosx-x64-debug Error:
>>>>> Program `...java' timed out (timeout set to ...ms, elapsed time including
>>>>> timeout handling was ...ms).
>>>>>
>>>>> Apparently, we can't rely on timing, since this test runs two orders
>>>>> of magnitude slower on debug JVM than on product one. I could simply mark
>>>>> the test as manual, or is there a better idea? Maybe, adjust the parameters
>>>>> and run the test automatically only on product or only on debug JVM? What
>>>>> do you think?
>>>>>
>>>>> Regards,
>>>>> Andrei
>>>>>
>>>>> сб, 25 июл. 2020 г. в 17:34, <coleen.phillimore at oracle.com>:
>>>>>
>>>>>>
>>>>>> Hi Andrei,
>>>>>> This looks good.  Thank you for finding this bug.  And thanks to
>>>>>> Volker for sponsoring it as well.
>>>>>> Nice to see you on the list, Andrei!
>>>>>> Coleen
>>>>>>
>>>>>> On 7/25/20 3:18 AM, Thomas Stüfe wrote:
>>>>>>
>>>>>> Hi, Andrei,
>>>>>>
>>>>>> Good find. I played around with a test of generating lots of lambdas
>>>>>> and yes, all the hashes are equal. With your patch invocation time went
>>>>>> down by half (that was for 10000 lambdas).
>>>>>>
>>>>>> The test looks fine though the normal way to do this seems to be
>>>>>> jcod. I personally don't care since the test is nice and self
>>>>>> contained that way, but someone from the Oracle runtime group should
>>>>>> confirm this is fine (ccing Coleen).
>>>>>>
>>>>>> JDK11 seems to be affected too.
>>>>>>
>>>>>> This probably also affects jruby.
>>>>>>
>>>>>> +1 from me.
>>>>>>
>>>>>> ..Thomas
>>>>>>
>>>>>> On Fri, Jul 24, 2020 at 4:53 PM Andrei Pangin <
>>>>>> andrei.pangin at gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Please review a small fix to a not-so-small performance issue that
>>>>>>> we've
>>>>>>> seen when migrating a production application from JDK 8 to JDK 14.
>>>>>>>
>>>>>>> On certain workloads, where Nashorn produces thousands MethodHandles,
>>>>>>> ResolvedMethodTable operations become extremely slow due to
>>>>>>> degenerate
>>>>>>> hashcode. This patch basically fixes hashcode by including the method
>>>>>>> holder's name in the computation. More details in the bug report.
>>>>>>>
>>>>>>> CR: https://bugs.openjdk.java.net/browse/JDK-8249719
>>>>>>> Webrev: https://cr.openjdk.java.net/~apangin/8249719/webrev/
>>>>>>>
>>>>>>> Tested: tier1-2, hotspot*runtime
>>>>>>>
>>>>>>> I'll be glad if someone could sponsor the patch.
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Andrei Pangin
>>>>>>>
>>>>>>
>>>>>>
>>>
>>