RFR: 8342902: Deduplication of acquire calls in BindingSpecializer causes escape-analyisis failure [v2]
Maurizio Cimadamore
mcimadamore at openjdk.org
Fri Oct 25 16:47:42 UTC 2024
> This PR fixes an issue where passing many by-reference parameters to downcall results in escape analysis failures.
> The problem is that, as the parameters grow, the generated code in the trampoline stub we generate also grows.
> When it reaches a certain threshold, it becomes too big, and it is no longer inlined in the caller.
> When this happens, allocations for by-reference parameters (e.g. a segment constructed from `MemorySegment::ofAddress`) can no longer be eliminated.
>
> The solution is two-fold. First, we annotate the generated trampoline with `@ForceInline`. After all, it is rather critical, to guarantee optimal performance, that this code can be always inlined.
> Second, to keep the size of the generated code under control, we also put a limit on the max number of comparisons that are generated in order to "deduplicate" scope acquire/release calls.
> The deduplication logic is a bit finicky -- it was put in place because, when confined and shared are passed by-ref, we need to prevent them from being closed in the middle of a native call.
> So, we save all the seen scopes in a bunch of locals, and then we compare each new scope with _all_ the previous cached locals, and skip acquire if we can.
>
> While this strategy work it does not scale when there are many by-ref parameters - as a by-ref parameter N will need N - 1 comparisons - which means a quadratic number of comparisons is generated.
> This is fixed in this PR by putting a lid on the maximum number of comparisons that are generated. We also make the comparisons a bit smarter, by always skipping the very first by-ref argument -- the downcall address.
> It is in fact very common for the downcall address to be in a different scope than that of the other by-ref arguments anyway.
>
> A nice property of the new logic is that by configuring the max number of comparisons we can effectively select between different strategies:
> * max = 0, means no dedup
> * max = 1, means one-element cache
> * max = N, means full dedup (like before)
>
> Thanks to Ioannis (@spasi) for the report and the benchmark. I've beefed the benchmark up by adding a case for 10 arguments, and also adding support for critical downcalls, so we could also test passing by-ref heap segments. Benchmark result will be provided in a separate comment.
Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
Fix issue with native benchmark lib
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/21706/files
- new: https://git.openjdk.org/jdk/pull/21706/files/e0371b27..72b22396
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=21706&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=21706&range=00-01
Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod
Patch: https://git.openjdk.org/jdk/pull/21706.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/21706/head:pull/21706
PR: https://git.openjdk.org/jdk/pull/21706
More information about the core-libs-dev
mailing list