RFR: 8280481: Duplicated stubs to interpreter for static calls [v2]

Vladimir Kozlov kvn at openjdk.org
Thu Jun 30 03:06:46 UTC 2022


On Wed, 29 Jun 2022 14:50:59 GMT, Evgeny Astigeevich <duke at openjdk.org> wrote:

>> ## Problem
>> Calls of Java methods have stubs to the interpreter for the cases when an invoked Java method is not compiled. Calls of static Java methods and final Java methods have statically bound information about a callee during compilation. Such calls can share stubs to the interpreter.
>> 
>> Each stub to the interpreter has a relocation record (accessed via `relocInfo`) which provides the address of the stub and the address of its owner. `relocInfo` has an offset which is an offset from the previously known relocatable address. The address of a stub is calculated as the address provided by the previous `relocInfo` plus the offset.
>> 
>> Each Java call has:
>> - A relocation for a call site.
>> - A relocation for a stub to the interpreter.
>> - A stub to the interpreter.
>> - If far jumps are used (arm64 case):
>>   - A trampoline relocation.
>>   - A trampoline.
>> 
>> We cannot avoid creating relocations. They are needed to support patching call sites.
>> With shared stubs there will be multiple relocations having the same stub address but different owners' addresses.
>> If we try to generate relocations as we go there will be a case which requires negative offsets:
>> 
>> reloc1  ---> 0x0: stub1
>> reloc2  ---> 0x4: stub2 (reloc2.addr = reloc1.addr + reloc2.offset = 0x0 + 4)
>> reloc3  ---> 0x0: stub1 (reloc3.addr = reloc2.addr + reloc3.offset = 0x4 - 4)
>> 
>> 
>> `CodeSection` does not support negative offsets. It [assumes](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.hpp#L195) addresses relocations pointing at grow upward.
>> Negative offsets reduce the offset range by half. This can increase filler records, the empty `relocInfo` records to reduce offset values. Also negative offsets are only needed for `static_stub_type`, but other 13 types don’t need them.
>> 
>> ## Solution
>> In this PR creation of stubs is done in two stages. First we collect requests for creating shared stubs: a callee `ciMethod*` and an offset of a call in `CodeBuffer` (see [src/hotspot/share/asm/codeBuffer.hpp](https://github.com/openjdk/jdk/pull/8816/files#diff-deb8ab083311ba60c0016dc34d6518579bbee4683c81e8d348982bac897fe8ae)). Then we have the finalisation phase (see [src/hotspot/share/ci/ciEnv.cpp](https://github.com/openjdk/jdk/pull/8816/files#diff-7c032de54e85754d39e080fd24d49b7469543b163f54229eb0631c6b1bf26450)), where `CodeBuffer::finalize_stubs()` creates shared stubs in `CodeBuffer`: a stub and multiple relocations sharing it. The first relocation will have positive offset. The rest will have zero offsets. This approach does not need negative offsets. As creation of relocations and stubs is platform dependent, `CodeBuffer::finalize_stubs()` calls `CodeBuffer::pd_finalize_stubs()` where platforms should put their code.
>> 
>> This PR provides implementations for x86, x86_64 and aarch64. [src/hotspot/share/asm/codeBuffer.inline.hpp](https://github.com/openjdk/jdk/pull/8816/files#diff-c268e3719578f2980edaa27c0eacbe9f620124310108eb65d0f765212c7042eb) provides the `emit_shared_stubs_to_interp` template which x86, x86_64 and aarch64 platforms use. Other platforms can use it too. Platforms supporting shared stubs to the interpreter must have `CodeBuffer::supports_shared_stubs()` returning `true`.
>> 
>> ## Results
>> **Results from [Renaissance 0.14.0](https://github.com/renaissance-benchmarks/renaissance/releases/tag/v0.14.0)**
>> Note: 'Nmethods with shared stubs' is the total number of nmethods counted during benchmark's run. 'Final # of nmethods' is a number of nmethods in CodeCache when JVM exited.
>> - AArch64
>> 
>> +------------------+-------------+----------------------------+---------------------+
>> |    Benchmark     | Saved bytes | Nmethods with shared stubs | Final # of nmethods |
>> +------------------+-------------+----------------------------+---------------------+
>> | dotty            |      820544 |                       4592 |               18872 |
>> | dec-tree         |      405280 |                       2580 |               22335 |
>> | naive-bayes      |      392384 |                       2586 |               21184 |
>> | log-regression   |      362208 |                       2450 |               20325 |
>> | als              |      306048 |                       2226 |               18161 |
>> | finagle-chirper  |      262304 |                       2087 |               12675 |
>> | movie-lens       |      250112 |                       1937 |               13617 |
>> | gauss-mix        |      173792 |                       1262 |               10304 |
>> | finagle-http     |      164320 |                       1392 |               11269 |
>> | page-rank        |      155424 |                       1175 |               10330 |
>> | chi-square       |      140384 |                       1028 |                9480 |
>> | akka-uct         |      115136 |                        541 |                3941 |
>> | reactors         |       43264 |                        335 |                2503 |
>> | scala-stm-bench7 |       42656 |                        326 |                3310 |
>> | philosophers     |       36576 |                        256 |                2902 |
>> | scala-doku       |       35008 |                        231 |                2695 |
>> | rx-scrabble      |       32416 |                        273 |                2789 |
>> | future-genetic   |       29408 |                        260 |                2339 |
>> | scrabble         |       27968 |                        225 |                2477 |
>> | par-mnemonics    |       19584 |                        168 |                1689 |
>> | fj-kmeans        |       19296 |                        156 |                1647 |
>> | scala-kmeans     |       18080 |                        140 |                1629 |
>> | mnemonics        |       17408 |                        143 |                1512 |
>> +------------------+-------------+----------------------------+---------------------+
>> 
>> - X86_64
>> 
>> +------------------+-------------+----------------------------+---------------------+
>> |    Benchmark     | Saved bytes | Nmethods with shared stubs | Final # of nmethods |
>> +------------------+-------------+----------------------------+---------------------+
>> | dotty            |      337065 |                       4403 |               19135 |
>> | dec-tree         |      183045 |                       2559 |               22071 |
>> | naive-bayes      |      176460 |                       2450 |               19782 |
>> | log-regression   |      162555 |                       2410 |               20648 |
>> | als              |      121275 |                       1980 |               17179 |
>> | movie-lens       |      111915 |                       1842 |               13020 |
>> | finagle-chirper  |      106350 |                       1947 |               12726 |
>> | gauss-mix        |       81975 |                       1251 |               10474 |
>> | finagle-http     |       80895 |                       1523 |               12294 |
>> | page-rank        |       68940 |                       1146 |               10124 |
>> | chi-square       |       62130 |                        974 |                9315 |
>> | akka-uct         |       50220 |                        555 |                4263 |
>> | reactors         |       23385 |                        371 |                2544 |
>> | philosophers     |       17625 |                        259 |                2865 |
>> | scala-stm-bench7 |       17235 |                        295 |                3230 |
>> | scala-doku       |       15600 |                        214 |                2698 |
>> | rx-scrabble      |       14190 |                        262 |                2770 |
>> | future-genetic   |       13155 |                        253 |                2318 |
>> | scrabble         |       12300 |                        217 |                2352 |
>> | fj-kmeans        |        8985 |                        157 |                1616 |
>> | par-mnemonics    |        8535 |                        155 |                1684 |
>> | scala-kmeans     |        8250 |                        138 |                1624 |
>> | mnemonics        |        7485 |                        134 |                1522 |
>> +------------------+-------------+----------------------------+---------------------+
>> 
>> 
>> **Testing: fastdebug and release builds for x86, x86_64 and aarch64**
>> - `tier1`...`tier4`: Passed
>> - `hotspot/jtreg/compiler/sharedstubs`: Passed
>
> Evgeny Astigeevich has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision:
> 
>  - Merge branch 'master' into JDK-8280481C
>  - Use call offset instead of caller pc
>  - Simplify test
>  - Fix x86 build failures
>  - Remove UseSharedStubs and clarify shared stub use cases
>  - Make SharedStubToInterpRequest ResourceObj and set initial size of SharedStubToInterpRequests to 8
>  - Update copyright year and add Unimplemented guards
>  - Set UseSharedStubs to true for X86
>  - Set UseSharedStubs to true for AArch64
>  - Fix x86 build failure
>  - ... and 10 more: https://git.openjdk.org/jdk/compare/e09644d9...da3bfb5b

Testing passed.

-------------

Marked as reviewed by kvn (Reviewer).

PR: https://git.openjdk.org/jdk/pull/8816


More information about the hotspot-compiler-dev mailing list