RFR: 8290965: PPC64: Implement post-call NOPs
Richard Reingruber
rrich at openjdk.org
Wed Dec 20 20:36:54 UTC 2023
#### Implementation of post call nops (PCNs) on ppc64.
Depends on https://github.com/openjdk/jdk/pull/17150
About post call nops:
- instruction(s) at return addresses of compiled java calls
- emitted iff vm continuations are enabled to support virtual threads
- encode data that can be be used to find the corresponding CodeBlob and oop map faster
- mt-safe patchable to trigger deoptimization
Background:
- Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack).
Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames.
- With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN.
Post call nops on ppc64
- 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1])
x86_64: 1 instruction, 8 bytes
aarch64: 3 instruction, 12 bytes
[1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B
https://openpowerfoundation.org/specifications/isa/
- 26 bits data payload
x86_64: 32 bits; aarch64: 32 bits
- 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64).
x86_64: 8 bits; aarch64: 8 bits
- 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment.
x86_64: 24 bits; aarch64: 24 bits
- Also used when reconstructing the back chain after thawing continuation frames (see `Thaw<ConfigT>::patch_caller_links`)
- Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs.
The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development.
- Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame constructor passing `frame::kind::code_blob`.
#### Statistics
| SpecJVM2008 compiler.compiler with fix iterations | ppc64le | x86_64 |
|---------------------------------------------------|---------|---------|
| PCN lookup success | 3715494 | 3410337 |
| PCN lookup failure | 220987 | 235436 |
| PCN decode success | 3660675 | 3320496 |
| PCN decode failure (C1) | 53539 | 46816 |
| PCN patch success | 63848 | 42310 |
| PCN patch cb offset failure | 0 | 0 |
| PCN patch oopmap slot failure | 0 | 298 |
| test/jdk/java/lang/Thread/virtual/stress/Skynet.java | ppc64le | x86_64 |
|------------------------------------------------------|-----------|-----------|
| PCN lookup success | 306955525 | 247185016 |
| PCN lookup failure | 500975 | 421098 |
| PCN decode success (C2) | 306951893 | 247181691 |
| PCN decode failure | 3168 | 59 |
| PCN patch success | 2080 | 2662 |
| PCN patch cb offset failure | 0 | 0 |
| PCN patch oopmap slot failure | 0 | 0 |
Comments
C1: We get decode failures even if patching always succeeded because not all PCNs are patched. Only PCNs in nmethods are actually patched. E.g. C2 runtime stubs like `_new_array_nozero_Java` have PCNs that are not patched.
C2: With Skynet.java there are 100x more PCN lookups. This is because it stresses virtual threads.
C2: With Skynet.java there are more PCN lookups on ppc64le. They originate from `Thaw<ConfigT>::patch_caller_links`.
### Testing
The change passed our CI testing. JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests.
All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le and AIX.
-------------
Depends on: https://git.openjdk.org/jdk/pull/17150
Commit messages:
- 8290965: PPC64: Implement post-call NOPs
Changes: https://git.openjdk.org/jdk/pull/17171/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17171&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8290965
Stats: 133 lines in 13 files changed: 96 ins; 0 del; 37 mod
Patch: https://git.openjdk.org/jdk/pull/17171.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/17171/head:pull/17171
PR: https://git.openjdk.org/jdk/pull/17171
More information about the hotspot-compiler-dev
mailing list