From blackdrag at gmx.org Mon Sep 4 11:56:25 2023 From: blackdrag at gmx.org (Jochen Theodorou) Date: Mon, 4 Sep 2023 13:56:25 +0200 Subject: initialization times for invokedynamic Message-ID: Hi, I write myself a small microbenchmark to get an idea about the time it takes to initialize a callsite. I made a simple test in which I write a class with a run method, which calls a method foo(I)I using invokedynamic and all the bootstrap method does is > handle = caller.findStatic(IndyCallsiteTests.class, name, type); > return new ConstantCallSite(handle); I compared this with a simple reflective solution: > Runnable r = new Runnable() { > @Override > public void run() { > Method m = IndyCallsiteTests.class.getMethod("foo", int.class); > m.invoke(null, 1); > } > }; and one with a very simple reflective caching: > Runnable r = new Runnable() { > Method m = null; > @Override > public void run() { > if (m == null) { > m = IndyCallsiteTests.class.getMethod("foo", int.class); > } > m.invoke(null, 1); > } > }; And my findings are that of course indy performs best in the long term, but based on reports I was wondering more about the initial costs. For the first couple of calls I get reflectiveCallCached ------------------------------------ 52627 8861 4335 3472 6032 6209 7484 7406 7267 6945 7546 7321 In sum ~122_000 reflectiveCall ------------------------------------ 61720 20147 4916 3719 4723 3421 4839 4661 5428 4379 4615 4453 In sum ~127_000 indyCall ------------------------------------ 835411 1461 1229 1335 1257 1500 1154 1546 1423 1351 1318 1303 In sum ~850_000 While peak performance is much better with indy, it takes a lot of calls (100k-200k) in this scenario for the indyCall to catch up to the other two variants. I would like to know if others on this list have similar experiences. Or did I make a fundamental mistake? bye Jochen From claes.redestad at oracle.com Mon Sep 4 12:14:09 2023 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 4 Sep 2023 12:14:09 +0000 Subject: initialization times for invokedynamic In-Reply-To: References: Message-ID: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> Hi, Could you post the full benchmark code somewhere? Which JDK version did you test this on? Later JDKs will use a MH-backed solution under the covers but much of the overhead of early initialization is mitigated by various techniques (pre-generating some LambdaForms, caching some runtime generated code in CDS etc) so it?s important for context to get more details. Best regards /Claes > 4 sep. 2023 kl. 13:56 skrev Jochen Theodorou : > > Hi, > > I write myself a small microbenchmark to get an idea about the time it > takes to initialize a callsite. I made a simple test in which I write a > class with a run method, which calls a method foo(I)I using > invokedynamic and all the bootstrap method does is > >> handle = caller.findStatic(IndyCallsiteTests.class, name, type); >> return new ConstantCallSite(handle); > > I compared this with a simple reflective solution: > >> Runnable r = new Runnable() { >> @Override >> public void run() { >> Method m = IndyCallsiteTests.class.getMethod("foo", int.class); >> m.invoke(null, 1); >> } >> }; > > and one with a very simple reflective caching: > >> Runnable r = new Runnable() { >> Method m = null; >> @Override >> public void run() { >> if (m == null) { >> m = IndyCallsiteTests.class.getMethod("foo", int.class); >> } >> m.invoke(null, 1); >> } >> }; > > And my findings are that of course indy performs best in the long term, > but based on reports I was wondering more about the initial costs. For > the first couple of calls I get > > reflectiveCallCached > ------------------------------------ > 52627 > 8861 > 4335 > 3472 > 6032 > 6209 > 7484 > 7406 > 7267 > 6945 > 7546 > 7321 > > In sum ~122_000 > > reflectiveCall > ------------------------------------ > 61720 > 20147 > 4916 > 3719 > 4723 > 3421 > 4839 > 4661 > 5428 > 4379 > 4615 > 4453 > > In sum ~127_000 > > indyCall > ------------------------------------ > 835411 > 1461 > 1229 > 1335 > 1257 > 1500 > 1154 > 1546 > 1423 > 1351 > 1318 > 1303 > > In sum ~850_000 > > While peak performance is much better with indy, it takes a lot of calls > (100k-200k) in this scenario for the indyCall to catch up to the other > two variants. > > I would like to know if others on this list have similar experiences. Or > did I make a fundamental mistake? > > bye Jochen > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.org > https://mail.openjdk.org/mailman/listinfo/mlvm-dev From blackdrag at gmx.org Tue Sep 5 06:58:23 2023 From: blackdrag at gmx.org (Jochen Theodorou) Date: Tue, 5 Sep 2023 08:58:23 +0200 Subject: initialization times for invokedynamic In-Reply-To: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> References: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> Message-ID: On 04.09.23 14:14, Claes Redestad wrote: > Hi, > > Could you post the full benchmark code somewhere? > > Which JDK version did you test this on? Later JDKs will use a MH-backed solution under the covers but much of the overhead of early initialization is mitigated by various techniques (pre-generating some LambdaForms, caching some runtime generated code in CDS etc) so it?s important for context to get more details. I changed the code a little bit in the meantime. I now no longer measure the first indy callsite, but also a second callsite to the same method using the same bootstrap method and I see an improvement as times go down from 863k to 136k. There seems to be a lot of code initialization. That is then 142k for the first 12 calls. (vs 88k in case of reflection, which means still a lot of calls before indy gets in front). I used JDK 17.0.7. As for the test (I would actually not really call that a benchmark). I basically used this here https://gist.github.com/blackdrag/28df334a8f49f06048d19848a50828c8 bye Jochen From blackdrag at gmx.org Sun Sep 10 08:33:29 2023 From: blackdrag at gmx.org (Jochen Theodorou) Date: Sun, 10 Sep 2023 10:33:29 +0200 Subject: initialization times for invokedynamic In-Reply-To: References: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> Message-ID: <662d08ec-15d1-1d22-4ca4-3a6179970abe@gmx.org> On 05.09.23 08:58, Jochen Theodorou wrote: [...] > I changed the code a little bit in the meantime. I now no longer measure > the first indy callsite, but also a second callsite to the same method > using the same bootstrap method and I see an improvement as times go > down from 863k to 136k. interestingly this seems to be per target method. Does the JVM have a relatively high one-time cost per direct method handle creation? For every direct method handle? bye Jochen From blackdrag at gmx.org Mon Sep 11 08:41:38 2023 From: blackdrag at gmx.org (Jochen Theodorou) Date: Mon, 11 Sep 2023 10:41:38 +0200 Subject: initialization times for invokedynamic In-Reply-To: <662d08ec-15d1-1d22-4ca4-3a6179970abe@gmx.org> References: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> <662d08ec-15d1-1d22-4ca4-3a6179970abe@gmx.org> Message-ID: <656b4618-3c83-008f-1c72-3d69e8d59a05@gmx.org> I changed my testing a bit to have more infrastructure types and test with a fresh VM each time. The scenario is still the same: call a method foo with argument 1. foo does nothing but returning 0. Implement the call. indyDirect: bootstrap method selects method and produces constant call-site indyDoubleDispatch: bootstrap selects a selector method and produces a mutable call-site. selector then selects the target method and sets it in the call-site reflective: a inner class is used to select the method using reflection and directly invoke it. reflectiveCached: same as reflective but caching the selected method staticCallSite: I have the call abstracted and replace what is called after method selection. Here with a direct call to the method using normal Java runtimeCallSite: I have the call abstracted like staticCallSite, but instead of replacing with a direct call I create a class at runtime, which does the direct call for me. My interest is in the performance of the first few calls. My experiments show that at most 5 calls there is no significant performance change anymore for a long time. But long time performance is secondary right now. Out of these implementations it is no surprise that staticCallSite has the least cost, but it is almost on par with the reflective variant. That really surprised me. It seems reflection came a long way since the old times. There is probably still a lot of cost in the long term, but well, I focus on the short term here right now. The cached variant really differs not much but if reflection gets a score of 41, then the cached variant is at 105. That is surprising much for an additional if condition. But if you think of how many instructions that involves maybe not that surprising. indyDirect has almost the same initial cost as the reflectiveCached. indyDoubleDispatch follows with a score of 149... which looks very much like reflective+indyDirect-"a small something". At 361 we find runtimeCallSite, the slowest by far. The numbers used to be quite different for this, but back then MagicAccessor was an option to reduce cost. My conclusion so far. callsite generation is a questionable option. Not only because of performance, but also because of the module system. Though we have cases where we can use the static variant. The next best is actually reflective. But how would you combine reflective with something that has better long term performance? Even a direct call with indy costs much more. I think I have to change my tests.. I think I should test a scenario in which I have a quite big number - like 1 million - of one-time call-sites to get really conclusive numbers... Of course that means 1 million direct method handles for indy. Well, I will write again if I have more numbers. bye Jochen From liangchenblue at gmail.com Mon Sep 11 09:02:16 2023 From: liangchenblue at gmail.com (-) Date: Mon, 11 Sep 2023 17:02:16 +0800 Subject: initialization times for invokedynamic In-Reply-To: <656b4618-3c83-008f-1c72-3d69e8d59a05@gmx.org> References: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> <662d08ec-15d1-1d22-4ca4-3a6179970abe@gmx.org> <656b4618-3c83-008f-1c72-3d69e8d59a05@gmx.org> Message-ID: Hello Jochen and Claes, I have done a little debugging and have found the cause, that looking up pre-generated LambdaForm (mentioned by Claes) causes VM to initialize an NoSuchMethodError [1] that's later silently dropped [2], but the NoSuchMethodError constructor is already executed and the stacktrace filled, causing a significant overhead, as shown in this [3] JMC's rendering of a JFR recording. You can capture this NoSuchMethodError construction with IDE debug, even when running an empty main, on newer JDK versions. I tested with a breakpoint in NoSuchMethodError(String) constructor and it hits twice (for instrumentation agent uses reflection, which now depends on Method Handles after JEP 416 in Java 18). I think a resolution would be to modify linkResolver so that it can also resolve speculatively instead of always throwing exceptions, but this might be too invasive and I want to hear from other developers such as Claes, who authored the old resolveOrNull silent-dropping patch. Looking forward to a solution, Chen Liang [1]: https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/interpreter/linkResolver.cpp#L773 [2]: https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/prims/methodHandles.cpp#L794-L796 [3]: https://cr.openjdk.org/~liach/mess/invokerbytecodegen-cache-miss.png On Mon, Sep 11, 2023 at 4:41?PM Jochen Theodorou wrote: > I changed my testing a bit to have more infrastructure types and test > with a fresh VM each time. > > The scenario is still the same: call a method foo with argument 1. foo > does nothing but returning 0. Implement the call. > > indyDirect: > bootstrap method selects method and produces constant call-site > > indyDoubleDispatch: > bootstrap selects a selector method and produces a mutable call-site. > selector then selects the target method and sets it in the call-site > > reflective: > a inner class is used to select the method using reflection and directly > invoke it. > > reflectiveCached: > same as reflective but caching the selected method > > staticCallSite: > I have the call abstracted and replace what is called after method > selection. Here with a direct call to the method using normal Java > > runtimeCallSite: > I have the call abstracted like staticCallSite, but instead of replacing > with a direct call I create a class at runtime, which does the direct > call for me. > > My interest is in the performance of the first few calls. My experiments > show that at most 5 calls there is no significant performance change > anymore for a long time. But long time performance is secondary right now. > > Out of these implementations it is no surprise that staticCallSite has > the least cost, but it is almost on par with the reflective variant. > That really surprised me. It seems reflection came a long way since the > old times. There is probably still a lot of cost in the long term, but > well, I focus on the short term here right now. > > The cached variant really differs not much but if reflection gets a > score of 41, then the cached variant is at 105. That is surprising much > for an additional if condition. But if you think of how many > instructions that involves maybe not that surprising. indyDirect has > almost the same initial cost as the reflectiveCached. indyDoubleDispatch > follows with a score of 149... which looks very much like > reflective+indyDirect-"a small something". At 361 we find > runtimeCallSite, the slowest by far. The numbers used to be quite > different for this, but back then MagicAccessor was an option to reduce > cost. > > My conclusion so far. callsite generation is a questionable option. Not > only because of performance, but also because of the module system. > Though we have cases where we can use the static variant. > > The next best is actually reflective. But how would you combine > reflective with something that has better long term performance? Even a > direct call with indy costs much more. > > I think I have to change my tests.. I think I should test a scenario in > which I have a quite big number - like 1 million - of one-time > call-sites to get really conclusive numbers... Of course that means 1 > million direct method handles for indy. > > Well, I will write again if I have more numbers. > > bye Jochen > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.org > https://mail.openjdk.org/mailman/listinfo/mlvm-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From claes.redestad at oracle.com Mon Sep 11 09:27:52 2023 From: claes.redestad at oracle.com (Claes Redestad) Date: Mon, 11 Sep 2023 09:27:52 +0000 Subject: initialization times for invokedynamic In-Reply-To: References: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> <662d08ec-15d1-1d22-4ca4-3a6179970abe@gmx.org> <656b4618-3c83-008f-1c72-3d69e8d59a05@gmx.org> Message-ID: <7C23F43C-7FE8-4D0C-9485-721E52AB6F00@oracle.com> Hi, It?s been something I?ve wanted to get rid of, sure. An alternative that wouldn?t require changes to the runtime code would be to store a table of LFs that have actually been generated and skip the speculative VM call. This can be done in a few different ways, would add a little overhead on hits but remove the exception overhead (which clutters JFR recordings) on misses /Claes 11 sep. 2023 kl. 11:02 skrev liangchenblue at gmail.com: Hello Jochen and Claes, I have done a little debugging and have found the cause, that looking up pre-generated LambdaForm (mentioned by Claes) causes VM to initialize an NoSuchMethodError [1] that's later silently dropped [2], but the NoSuchMethodError constructor is already executed and the stacktrace filled, causing a significant overhead, as shown in this [3] JMC's rendering of a JFR recording. You can capture this NoSuchMethodError construction with IDE debug, even when running an empty main, on newer JDK versions. I tested with a breakpoint in NoSuchMethodError(String) constructor and it hits twice (for instrumentation agent uses reflection, which now depends on Method Handles after JEP 416 in Java 18). I think a resolution would be to modify linkResolver so that it can also resolve speculatively instead of always throwing exceptions, but this might be too invasive and I want to hear from other developers such as Claes, who authored the old resolveOrNull silent-dropping patch. Looking forward to a solution, Chen Liang [1]: https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/interpreter/linkResolver.cpp#L773 [2]: https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/prims/methodHandles.cpp#L794-L796 [3]: https://cr.openjdk.org/~liach/mess/invokerbytecodegen-cache-miss.png On Mon, Sep 11, 2023 at 4:41?PM Jochen Theodorou > wrote: I changed my testing a bit to have more infrastructure types and test with a fresh VM each time. The scenario is still the same: call a method foo with argument 1. foo does nothing but returning 0. Implement the call. indyDirect: bootstrap method selects method and produces constant call-site indyDoubleDispatch: bootstrap selects a selector method and produces a mutable call-site. selector then selects the target method and sets it in the call-site reflective: a inner class is used to select the method using reflection and directly invoke it. reflectiveCached: same as reflective but caching the selected method staticCallSite: I have the call abstracted and replace what is called after method selection. Here with a direct call to the method using normal Java runtimeCallSite: I have the call abstracted like staticCallSite, but instead of replacing with a direct call I create a class at runtime, which does the direct call for me. My interest is in the performance of the first few calls. My experiments show that at most 5 calls there is no significant performance change anymore for a long time. But long time performance is secondary right now. Out of these implementations it is no surprise that staticCallSite has the least cost, but it is almost on par with the reflective variant. That really surprised me. It seems reflection came a long way since the old times. There is probably still a lot of cost in the long term, but well, I focus on the short term here right now. The cached variant really differs not much but if reflection gets a score of 41, then the cached variant is at 105. That is surprising much for an additional if condition. But if you think of how many instructions that involves maybe not that surprising. indyDirect has almost the same initial cost as the reflectiveCached. indyDoubleDispatch follows with a score of 149... which looks very much like reflective+indyDirect-"a small something". At 361 we find runtimeCallSite, the slowest by far. The numbers used to be quite different for this, but back then MagicAccessor was an option to reduce cost. My conclusion so far. callsite generation is a questionable option. Not only because of performance, but also because of the module system. Though we have cases where we can use the static variant. The next best is actually reflective. But how would you combine reflective with something that has better long term performance? Even a direct call with indy costs much more. I think I have to change my tests.. I think I should test a scenario in which I have a quite big number - like 1 million - of one-time call-sites to get really conclusive numbers... Of course that means 1 million direct method handles for indy. Well, I will write again if I have more numbers. bye Jochen _______________________________________________ mlvm-dev mailing list mlvm-dev at openjdk.org https://mail.openjdk.org/mailman/listinfo/mlvm-dev _______________________________________________ mlvm-dev mailing list mlvm-dev at openjdk.org https://mail.openjdk.org/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From liangchenblue at gmail.com Mon Sep 11 12:50:36 2023 From: liangchenblue at gmail.com (-) Date: Mon, 11 Sep 2023 20:50:36 +0800 Subject: initialization times for invokedynamic In-Reply-To: <7C23F43C-7FE8-4D0C-9485-721E52AB6F00@oracle.com> References: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> <662d08ec-15d1-1d22-4ca4-3a6179970abe@gmx.org> <656b4618-3c83-008f-1c72-3d69e8d59a05@gmx.org> <7C23F43C-7FE8-4D0C-9485-721E52AB6F00@oracle.com> Message-ID: Hi Claes, After looking at the usages of resolveOrFail in VarHandle, I believe changing the runtime's THROW_MSG_NULL template occurrences in linkResolver would be a better approach, for resolveOrFail is currently the most efficient way for finding particular members; reflection, on the other hand, has to perform a search over the list of all methods to find one that's accessible. On an unrelated note, VarForm can probably substitute failed resolution MemberName with dummy ones like that for Object.toString so we don't need to query resolveOrNull repeatedly. /Chen On Mon, Sep 11, 2023 at 5:28?PM Claes Redestad wrote: > Hi, > > It?s been something I?ve wanted to get rid of, sure. An alternative that > wouldn?t require changes to the runtime code would be to store a table of > LFs that have actually been generated and skip the speculative VM call. > This can be done in a few different ways, would add a little overhead on > hits but remove the exception overhead (which clutters JFR recordings) on > misses > > /Claes > > 11 sep. 2023 kl. 11:02 skrev liangchenblue at gmail.com: > > Hello Jochen and Claes, > I have done a little debugging and have found the cause, that looking up > pre-generated LambdaForm (mentioned by Claes) causes VM to initialize an > NoSuchMethodError [1] that's later silently dropped [2], but the > NoSuchMethodError constructor is already executed and the stacktrace > filled, causing a significant overhead, as shown in this [3] JMC's > rendering of a JFR recording. > > You can capture this NoSuchMethodError construction with IDE debug, even > when running an empty main, on newer JDK versions. I tested with a > breakpoint in NoSuchMethodError(String) constructor and it hits twice (for > instrumentation agent uses reflection, which now depends on Method Handles > after JEP 416 in Java 18). > > I think a resolution would be to modify linkResolver so that it can also > resolve speculatively instead of always throwing exceptions, but this might > be too invasive and I want to hear from other developers such as Claes, who > authored the old resolveOrNull silent-dropping patch. > > Looking forward to a solution, > Chen Liang > > [1]: > https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/interpreter/linkResolver.cpp#L773 > [2]: > https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/prims/methodHandles.cpp#L794-L796 > [3]: https://cr.openjdk.org/~liach/mess/invokerbytecodegen-cache-miss.png > > On Mon, Sep 11, 2023 at 4:41?PM Jochen Theodorou > wrote: > >> I changed my testing a bit to have more infrastructure types and test >> with a fresh VM each time. >> >> The scenario is still the same: call a method foo with argument 1. foo >> does nothing but returning 0. Implement the call. >> >> indyDirect: >> bootstrap method selects method and produces constant call-site >> >> indyDoubleDispatch: >> bootstrap selects a selector method and produces a mutable call-site. >> selector then selects the target method and sets it in the call-site >> >> reflective: >> a inner class is used to select the method using reflection and directly >> invoke it. >> >> reflectiveCached: >> same as reflective but caching the selected method >> >> staticCallSite: >> I have the call abstracted and replace what is called after method >> selection. Here with a direct call to the method using normal Java >> >> runtimeCallSite: >> I have the call abstracted like staticCallSite, but instead of replacing >> with a direct call I create a class at runtime, which does the direct >> call for me. >> >> My interest is in the performance of the first few calls. My experiments >> show that at most 5 calls there is no significant performance change >> anymore for a long time. But long time performance is secondary right now. >> >> Out of these implementations it is no surprise that staticCallSite has >> the least cost, but it is almost on par with the reflective variant. >> That really surprised me. It seems reflection came a long way since the >> old times. There is probably still a lot of cost in the long term, but >> well, I focus on the short term here right now. >> >> The cached variant really differs not much but if reflection gets a >> score of 41, then the cached variant is at 105. That is surprising much >> for an additional if condition. But if you think of how many >> instructions that involves maybe not that surprising. indyDirect has >> almost the same initial cost as the reflectiveCached. indyDoubleDispatch >> follows with a score of 149... which looks very much like >> reflective+indyDirect-"a small something". At 361 we find >> runtimeCallSite, the slowest by far. The numbers used to be quite >> different for this, but back then MagicAccessor was an option to reduce >> cost. >> >> My conclusion so far. callsite generation is a questionable option. Not >> only because of performance, but also because of the module system. >> Though we have cases where we can use the static variant. >> >> The next best is actually reflective. But how would you combine >> reflective with something that has better long term performance? Even a >> direct call with indy costs much more. >> >> I think I have to change my tests.. I think I should test a scenario in >> which I have a quite big number - like 1 million - of one-time >> call-sites to get really conclusive numbers... Of course that means 1 >> million direct method handles for indy. >> >> Well, I will write again if I have more numbers. >> >> bye Jochen >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.org >> https://mail.openjdk.org/mailman/listinfo/mlvm-dev >> > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.org > https://mail.openjdk.org/mailman/listinfo/mlvm-dev > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.org > https://mail.openjdk.org/mailman/listinfo/mlvm-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From claes.redestad at oracle.com Tue Sep 12 12:19:42 2023 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 12 Sep 2023 12:19:42 +0000 Subject: initialization times for invokedynamic In-Reply-To: References: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> <662d08ec-15d1-1d22-4ca4-3a6179970abe@gmx.org> <656b4618-3c83-008f-1c72-3d69e8d59a05@gmx.org> <7C23F43C-7FE8-4D0C-9485-721E52AB6F00@oracle.com> Message-ID: <4D3DA589-C3EF-4492-A26A-8053A7D3B2B0@oracle.com> Hi, I wasn?t suggesting using, but to generate and archive some kind of lookup table when generating the pre-generated LambdaForm Holder classes, i.e., somewhere in GenerateJLIClassesPlugin. Depending on implementation choices this would add a little bit of footprint and an extra lookup step, so a little overhead when the speculation wins to reduce a larger cost on speculation failures - which might be a net win. But yes, fixing this in the runtime code would be even better. Main issue is that the linkResolver code you point out is shared between resolveOrNull/resolveOrFail and bytecode linking - and we probably shouldn?t change semantics of the latter. If you can come up with a patch idea that solves this issue for resolveOrNull, such as throwing a pre-created NSME in case the caller is resolveOrFail, then I?d be happy to file an RFE and help get it through review. /Claes 11 sep. 2023 kl. 14:50 skrev liangchenblue at gmail.com: Hi Claes, After looking at the usages of resolveOrFail in VarHandle, I believe changing the runtime's THROW_MSG_NULL template occurrences in linkResolver would be a better approach, for resolveOrFail is currently the most efficient way for finding particular members; reflection, on the other hand, has to perform a search over the list of all methods to find one that's accessible. On an unrelated note, VarForm can probably substitute failed resolution MemberName with dummy ones like that for Object.toString so we don't need to query resolveOrNull repeatedly. /Chen On Mon, Sep 11, 2023 at 5:28?PM Claes Redestad > wrote: Hi, It?s been something I?ve wanted to get rid of, sure. An alternative that wouldn?t require changes to the runtime code would be to store a table of LFs that have actually been generated and skip the speculative VM call. This can be done in a few different ways, would add a little overhead on hits but remove the exception overhead (which clutters JFR recordings) on misses /Claes 11 sep. 2023 kl. 11:02 skrev liangchenblue at gmail.com: Hello Jochen and Claes, I have done a little debugging and have found the cause, that looking up pre-generated LambdaForm (mentioned by Claes) causes VM to initialize an NoSuchMethodError [1] that's later silently dropped [2], but the NoSuchMethodError constructor is already executed and the stacktrace filled, causing a significant overhead, as shown in this [3] JMC's rendering of a JFR recording. You can capture this NoSuchMethodError construction with IDE debug, even when running an empty main, on newer JDK versions. I tested with a breakpoint in NoSuchMethodError(String) constructor and it hits twice (for instrumentation agent uses reflection, which now depends on Method Handles after JEP 416 in Java 18). I think a resolution would be to modify linkResolver so that it can also resolve speculatively instead of always throwing exceptions, but this might be too invasive and I want to hear from other developers such as Claes, who authored the old resolveOrNull silent-dropping patch. Looking forward to a solution, Chen Liang [1]: https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/interpreter/linkResolver.cpp#L773 [2]: https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/prims/methodHandles.cpp#L794-L796 [3]: https://cr.openjdk.org/~liach/mess/invokerbytecodegen-cache-miss.png On Mon, Sep 11, 2023 at 4:41?PM Jochen Theodorou > wrote: I changed my testing a bit to have more infrastructure types and test with a fresh VM each time. The scenario is still the same: call a method foo with argument 1. foo does nothing but returning 0. Implement the call. indyDirect: bootstrap method selects method and produces constant call-site indyDoubleDispatch: bootstrap selects a selector method and produces a mutable call-site. selector then selects the target method and sets it in the call-site reflective: a inner class is used to select the method using reflection and directly invoke it. reflectiveCached: same as reflective but caching the selected method staticCallSite: I have the call abstracted and replace what is called after method selection. Here with a direct call to the method using normal Java runtimeCallSite: I have the call abstracted like staticCallSite, but instead of replacing with a direct call I create a class at runtime, which does the direct call for me. My interest is in the performance of the first few calls. My experiments show that at most 5 calls there is no significant performance change anymore for a long time. But long time performance is secondary right now. Out of these implementations it is no surprise that staticCallSite has the least cost, but it is almost on par with the reflective variant. That really surprised me. It seems reflection came a long way since the old times. There is probably still a lot of cost in the long term, but well, I focus on the short term here right now. The cached variant really differs not much but if reflection gets a score of 41, then the cached variant is at 105. That is surprising much for an additional if condition. But if you think of how many instructions that involves maybe not that surprising. indyDirect has almost the same initial cost as the reflectiveCached. indyDoubleDispatch follows with a score of 149... which looks very much like reflective+indyDirect-"a small something". At 361 we find runtimeCallSite, the slowest by far. The numbers used to be quite different for this, but back then MagicAccessor was an option to reduce cost. My conclusion so far. callsite generation is a questionable option. Not only because of performance, but also because of the module system. Though we have cases where we can use the static variant. The next best is actually reflective. But how would you combine reflective with something that has better long term performance? Even a direct call with indy costs much more. I think I have to change my tests.. I think I should test a scenario in which I have a quite big number - like 1 million - of one-time call-sites to get really conclusive numbers... Of course that means 1 million direct method handles for indy. Well, I will write again if I have more numbers. bye Jochen _______________________________________________ mlvm-dev mailing list mlvm-dev at openjdk.org https://mail.openjdk.org/mailman/listinfo/mlvm-dev _______________________________________________ mlvm-dev mailing list mlvm-dev at openjdk.org https://mail.openjdk.org/mailman/listinfo/mlvm-dev _______________________________________________ mlvm-dev mailing list mlvm-dev at openjdk.org https://mail.openjdk.org/mailman/listinfo/mlvm-dev _______________________________________________ mlvm-dev mailing list mlvm-dev at openjdk.org https://mail.openjdk.org/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From claes.redestad at oracle.com Tue Sep 12 13:08:45 2023 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 12 Sep 2023 13:08:45 +0000 Subject: initialization times for invokedynamic In-Reply-To: References: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> <662d08ec-15d1-1d22-4ca4-3a6179970abe@gmx.org> <656b4618-3c83-008f-1c72-3d69e8d59a05@gmx.org> <7C23F43C-7FE8-4D0C-9485-721E52AB6F00@oracle.com> Message-ID: <799C66B7-FF4F-4180-AA57-A1619F583F22@oracle.com> Hi, I wasn?t suggesting full-blown reflection, but to archive some kind of lookup table when generating the pre-generated LambdaForms. This would add a little footprint and an extra lookup with negligible cost on every lookup. So it?d add a little overhead when the speculation wins and remove a larger cost on speculation failures. But yes, fixing this in the runtime code would be great. Main issue is that the linkResolver code you point out is shared between resolveOrNull/resolveOrFail and regular bytecode linking - and we probably shouldn?t change the semantics of the latter. If you can come up with a patch idea that solves this issue for resolveOrNull I?d be happy to file an RFE and help get it through review over at hotspot-runtime-dev at openjdk.org 11 sep. 2023 kl. 14:50 skrev liangchenblue at gmail.com: Hi Claes, After looking at the usages of resolveOrFail in VarHandle, I believe changing the runtime's THROW_MSG_NULL template occurrences in linkResolver would be a better approach, for resolveOrFail is currently the most efficient way for finding particular members; reflection, on the other hand, has to perform a search over the list of all methods to find one that's accessible. On an unrelated note, VarForm can probably substitute failed resolution MemberName with dummy ones like that for Object.toString so we don't need to query resolveOrNull repeatedly. /Chen On Mon, Sep 11, 2023 at 5:28?PM Claes Redestad > wrote: Hi, It?s been something I?ve wanted to get rid of, sure. An alternative that wouldn?t require changes to the runtime code would be to store a table of LFs that have actually been generated and skip the speculative VM call. This can be done in a few different ways, would add a little overhead on hits but remove the exception overhead (which clutters JFR recordings) on misses /Claes 11 sep. 2023 kl. 11:02 skrev liangchenblue at gmail.com: Hello Jochen and Claes, I have done a little debugging and have found the cause, that looking up pre-generated LambdaForm (mentioned by Claes) causes VM to initialize an NoSuchMethodError [1] that's later silently dropped [2], but the NoSuchMethodError constructor is already executed and the stacktrace filled, causing a significant overhead, as shown in this [3] JMC's rendering of a JFR recording. You can capture this NoSuchMethodError construction with IDE debug, even when running an empty main, on newer JDK versions. I tested with a breakpoint in NoSuchMethodError(String) constructor and it hits twice (for instrumentation agent uses reflection, which now depends on Method Handles after JEP 416 in Java 18). I think a resolution would be to modify linkResolver so that it can also resolve speculatively instead of always throwing exceptions, but this might be too invasive and I want to hear from other developers such as Claes, who authored the old resolveOrNull silent-dropping patch. Looking forward to a solution, Chen Liang [1]: https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/interpreter/linkResolver.cpp#L773 [2]: https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/prims/methodHandles.cpp#L794-L796 [3]: https://cr.openjdk.org/~liach/mess/invokerbytecodegen-cache-miss.png On Mon, Sep 11, 2023 at 4:41?PM Jochen Theodorou > wrote: I changed my testing a bit to have more infrastructure types and test with a fresh VM each time. The scenario is still the same: call a method foo with argument 1. foo does nothing but returning 0. Implement the call. indyDirect: bootstrap method selects method and produces constant call-site indyDoubleDispatch: bootstrap selects a selector method and produces a mutable call-site. selector then selects the target method and sets it in the call-site reflective: a inner class is used to select the method using reflection and directly invoke it. reflectiveCached: same as reflective but caching the selected method staticCallSite: I have the call abstracted and replace what is called after method selection. Here with a direct call to the method using normal Java runtimeCallSite: I have the call abstracted like staticCallSite, but instead of replacing with a direct call I create a class at runtime, which does the direct call for me. My interest is in the performance of the first few calls. My experiments show that at most 5 calls there is no significant performance change anymore for a long time. But long time performance is secondary right now. Out of these implementations it is no surprise that staticCallSite has the least cost, but it is almost on par with the reflective variant. That really surprised me. It seems reflection came a long way since the old times. There is probably still a lot of cost in the long term, but well, I focus on the short term here right now. The cached variant really differs not much but if reflection gets a score of 41, then the cached variant is at 105. That is surprising much for an additional if condition. But if you think of how many instructions that involves maybe not that surprising. indyDirect has almost the same initial cost as the reflectiveCached. indyDoubleDispatch follows with a score of 149... which looks very much like reflective+indyDirect-"a small something". At 361 we find runtimeCallSite, the slowest by far. The numbers used to be quite different for this, but back then MagicAccessor was an option to reduce cost. My conclusion so far. callsite generation is a questionable option. Not only because of performance, but also because of the module system. Though we have cases where we can use the static variant. The next best is actually reflective. But how would you combine reflective with something that has better long term performance? Even a direct call with indy costs much more. I think I have to change my tests.. I think I should test a scenario in which I have a quite big number - like 1 million - of one-time call-sites to get really conclusive numbers... Of course that means 1 million direct method handles for indy. Well, I will write again if I have more numbers. bye Jochen _______________________________________________ mlvm-dev mailing list mlvm-dev at openjdk.org https://mail.openjdk.org/mailman/listinfo/mlvm-dev _______________________________________________ mlvm-dev mailing list mlvm-dev at openjdk.org https://mail.openjdk.org/mailman/listinfo/mlvm-dev _______________________________________________ mlvm-dev mailing list mlvm-dev at openjdk.org https://mail.openjdk.org/mailman/listinfo/mlvm-dev _______________________________________________ mlvm-dev mailing list mlvm-dev at openjdk.org https://mail.openjdk.org/mailman/listinfo/mlvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From blackdrag at gmx.org Mon Sep 25 13:59:57 2023 From: blackdrag at gmx.org (Jochen Theodorou) Date: Mon, 25 Sep 2023 15:59:57 +0200 Subject: initialization times for invokedynamic In-Reply-To: <799C66B7-FF4F-4180-AA57-A1619F583F22@oracle.com> References: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> <662d08ec-15d1-1d22-4ca4-3a6179970abe@gmx.org> <656b4618-3c83-008f-1c72-3d69e8d59a05@gmx.org> <7C23F43C-7FE8-4D0C-9485-721E52AB6F00@oracle.com> <799C66B7-FF4F-4180-AA57-A1619F583F22@oracle.com> Message-ID: <6581f4ff-58c7-7421-86b0-57b200679fc9@gmx.org> so my takes from this: 1) future Reflection will use MethodHandles, meaning my variants depending on Reflection will have similar times to MethodHandles very soon. 2) MethodHandle resolve code may improve in the future and my MethodHandle variants would then probably become faster 3) there is actually not so much I can do right now, except maybe having more pre-generated MethodHandles bye Jochen On 12.09.23 15:08, Claes Redestad wrote: > Hi, > > I wasn?t suggesting full-blown reflection, but to archive some kind of > lookup table when generating the pre-generated LambdaForms. This would > add a little footprint and an extra lookup with negligible cost on every > lookup. So it?d add a little overhead when the speculation wins and > remove a larger cost on speculation failures. > > But yes, fixing this in the runtime code would be great. Main issue is > that the linkResolver code you point out is shared between > resolveOrNull/resolveOrFail and regular bytecode linking - and we > probably shouldn?t change the semantics of the latter. If you can come > up with a patch idea that solves this issue for resolveOrNull I?d be > happy to file an RFE and help get it through review over at > hotspot-runtime-dev at openjdk.org > >> 11 sep. 2023 kl. 14:50 skrev liangchenblue at gmail.com: >> >> Hi Claes, >> After looking at the usages of resolveOrFail in VarHandle, I believe >> changing the runtime's THROW_MSG_NULL template occurrences in >> linkResolver would be a better approach, for resolveOrFail is >> currently the most efficient way for finding particular members; >> reflection, on the other hand, has to perform a search over the list >> of all methods to find one that's accessible. >> >> On an unrelated note, VarForm can probably substitute failed >> resolution MemberName with dummy ones like that for Object.toString so >> we don't need to query resolveOrNull repeatedly. >> >> /Chen >> >> On Mon, Sep 11, 2023 at 5:28?PM Claes Redestad >> > wrote: >> >> Hi, >> >> It?s been something I?ve wanted to get rid of, sure. An >> alternative that wouldn?t require changes to the runtime code >> would be to store a table of LFs that have actually been generated >> and skip the speculative VM call. This can be done in a few >> different ways, would add a little overhead on hits but remove the >> exception overhead (which clutters JFR recordings) on misses >> >> /Claes >> >>> 11 sep. 2023 kl. 11:02 skrev liangchenblue at gmail.com >>> : >>> >>> Hello Jochen and Claes, >>> I have done a little debugging and have found the cause, that >>> looking up pre-generated LambdaForm (mentioned by Claes) causes >>> VM to initialize an NoSuchMethodError [1] that's later silently >>> dropped [2], but the NoSuchMethodError constructor is already >>> executed and the stacktrace filled, causing a significant >>> overhead, as shown in this [3] JMC's rendering of a JFR recording. >>> >>> You can capture this NoSuchMethodError construction with IDE >>> debug, even when running an empty main,?on?newer JDK versions. I >>> tested with a breakpoint in NoSuchMethodError(String) constructor >>> and it hits twice (for instrumentation agent uses reflection, >>> which now depends on Method Handles after JEP 416 in Java 18). >>> >>> I think a resolution would be to modify linkResolver so that it >>> can also resolve speculatively instead of always throwing >>> exceptions, but this might be too invasive and I want to hear >>> from other developers such as Claes, who authored the old >>> resolveOrNull silent-dropping patch. >>> >>> Looking forward to a solution, >>> Chen Liang >>> >>> [1]: >>> https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/interpreter/linkResolver.cpp#L773 >>> >>> [2]: >>> https://github.com/openjdk/jdk/blob/a04c6c1ac663a1eab7d45913940cb6ac0af2c11c/src/hotspot/share/prims/methodHandles.cpp#L794-L796 >>> >>> [3]: >>> https://cr.openjdk.org/~liach/mess/invokerbytecodegen-cache-miss.png >>> >>> >>> On Mon, Sep 11, 2023 at 4:41?PM Jochen Theodorou >>> > wrote: >>> >>> I changed my testing a bit to have more infrastructure types >>> and test >>> with a fresh VM each time. >>> >>> The scenario is still the same: call a method foo with >>> argument 1. foo >>> does nothing but returning 0. Implement the call. >>> >>> indyDirect: >>> bootstrap method selects method and produces constant call-site >>> >>> indyDoubleDispatch: >>> bootstrap selects a selector method and produces a mutable >>> call-site. >>> selector then selects the target method and sets it in the >>> call-site >>> >>> reflective: >>> a inner class is used to select the method using reflection >>> and directly >>> invoke it. >>> >>> reflectiveCached: >>> same as reflective but caching the selected method >>> >>> staticCallSite: >>> I have the call abstracted and replace what is called after >>> method >>> selection. Here with a direct call to the method using normal >>> Java >>> >>> runtimeCallSite: >>> I have the call abstracted like staticCallSite, but instead >>> of replacing >>> with a direct call I create a class at runtime, which does >>> the direct >>> call for me. >>> >>> My interest is in the performance of the first few calls. My >>> experiments >>> show that at most 5 calls there is no significant performance >>> change >>> anymore for a long time. But long time performance is >>> secondary right now. >>> >>> Out of these implementations it is no surprise that >>> staticCallSite has >>> the least cost, but it is almost on par with the reflective >>> variant. >>> That really surprised me. It seems reflection came a long way >>> since the >>> old times. There is probably still a lot of cost in the long >>> term, but >>> well, I focus on the short term here right now. >>> >>> The cached variant really differs not much but if reflection >>> gets a >>> score of 41, then the cached variant is at 105. That is >>> surprising much >>> for an additional if condition. But if you think of how many >>> instructions that involves maybe not that surprising. >>> indyDirect has >>> almost the same initial cost as the reflectiveCached. >>> indyDoubleDispatch >>> follows with a score of 149... which looks very much like >>> reflective+indyDirect-"a small something". At 361 we find >>> runtimeCallSite, the slowest by far. The numbers used to be quite >>> different for this, but back then MagicAccessor was an option >>> to reduce >>> cost. >>> >>> My conclusion so far. callsite generation is a questionable >>> option. Not >>> only because of performance, but also because of the module >>> system. >>> Though we have cases where we can use the static variant. >>> >>> The next best is actually reflective. But how would you combine >>> reflective with something that has better long term >>> performance? Even a >>> direct call with indy costs much more. >>> >>> I think I have to change my tests.. I think I should test a >>> scenario in >>> which I have a quite big number - like 1 million - of one-time >>> call-sites to get really conclusive numbers... Of course that >>> means 1 >>> million direct method handles for indy. >>> >>> Well, I will write again if I have more numbers. >>> >>> bye Jochen >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.org >>> https://mail.openjdk.org/mailman/listinfo/mlvm-dev >>> >>> >>> _______________________________________________ >>> mlvm-dev mailing list >>> mlvm-dev at openjdk.org >>> https://mail.openjdk.org/mailman/listinfo/mlvm-dev >>> >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.org >> https://mail.openjdk.org/mailman/listinfo/mlvm-dev >> >> >> _______________________________________________ >> mlvm-dev mailing list >> mlvm-dev at openjdk.org >> https://mail.openjdk.org/mailman/listinfo/mlvm-dev > > > _______________________________________________ > mlvm-dev mailing list > mlvm-dev at openjdk.org > https://mail.openjdk.org/mailman/listinfo/mlvm-dev From benjamin.john.evans at gmail.com Mon Sep 25 16:21:47 2023 From: benjamin.john.evans at gmail.com (Ben Evans) Date: Mon, 25 Sep 2023 16:21:47 +0000 Subject: initialization times for invokedynamic In-Reply-To: <6581f4ff-58c7-7421-86b0-57b200679fc9@gmx.org> References: <98E8348C-D350-4A5F-A961-B5DCB9607441@oracle.com> <662d08ec-15d1-1d22-4ca4-3a6179970abe@gmx.org> <656b4618-3c83-008f-1c72-3d69e8d59a05@gmx.org> <7C23F43C-7FE8-4D0C-9485-721E52AB6F00@oracle.com> <799C66B7-FF4F-4180-AA57-A1619F583F22@oracle.com> <6581f4ff-58c7-7421-86b0-57b200679fc9@gmx.org> Message-ID: On Mon, Sep 25, 2023 at 2:00?PM Jochen Theodorou wrote: > > so my takes from this: > > 1) future Reflection will use MethodHandles, meaning my variants > depending on Reflection will have similar times to MethodHandles very soon. Java 18 and onwards already have the MH-based implementation: https://blogs.oracle.com/javamagazine/post/java-reflection-method-handles So, it might be interesting to retry your bench on Java 21 and see what that looks like. Cheers, Ben