RFR: 8319700: [AArch64] C2 compilation fails with "Field too big for insn"
Axel Boldt-Christmas
aboldtch at openjdk.org
Thu Nov 23 12:44:18 UTC 2023
On Wed, 22 Nov 2023 10:44:12 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:
> Not all ZGC C2 BarrierStubs used on aarch64 participates in the laying out of trampoline stubs. (Used enable as many `tbX` instructions as possible.) This leads to to incorrect calculations which may cause the target offset for the `tbX` branch to become to large.
>
> This fix changes all the BarriesStubs to stubs which participates in the trampoline logic.
>
> Until more platforms requires specialised barrier stub layouts it is not worth adding better support for this pattern. Without a redesign it does make it harder to ensure that this is used correctly. For now the shared code asserts when building for aarch64 that the general shared stubs are not used directly. But care would still have to be taken if any new barrier stubs are introduced.
>
> The behaviour was more easily reproducible when large inlining heuristics. This flag combination was used to get somewhat reliable reproducibility `-esa -ea -XX:MaxInlineLevel=300 -XX:MaxInlineSize=1100 -XX:MaxTrivialSize=1000 -XX:LiveNodeCountInliningCutoff=1000000 -XX:MaxNodeLimit=3000000 -XX:NodeLimitFudgeFactor=600000 -XX:+UnlockExperimentalVMOptions -XX:+UseVectorStubs`
>
> There was also an observation inside the JBS comments that there where no `tbX` instructions branching to the emitted trampolines. However I was unable to reproduce this. Ran all tests with the following guarantee, this could not observe it either.
>
>
> diff --git a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp
> index ebaf1829972..b6c40163a6b 100644
> --- a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp
> +++ b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp
> @@ -36,6 +36,7 @@
> #include "runtime/icache.hpp"
> #include "runtime/jniHandles.hpp"
> #include "runtime/sharedRuntime.hpp"
> +#include "utilities/debug.hpp"
> #include "utilities/macros.hpp"
> #ifdef COMPILER1
> #include "c1/c1_LIRAssembler.hpp"
> @@ -1358,6 +1359,7 @@ void ZLoadBarrierStubC2Aarch64::emit_code(MacroAssembler& masm) {
> // Current assumption is that the barrier stubs are the first stubs emitted after the actual code
> assert(stubs_start_offset() <= output->buffer_sizing_data()->_code, "stubs are assumed to be emitted directly after code and code_size is a hard limit on where it can start");
>
> + guarantee(!_test_and_branch_reachable_entry.is_unused(), "Should be used");
> __ bind(_test_and_branch_reachable_entry);
>
> // Next branch's offset is unknown, but is > branch_offset
>
>
> - T...
It would be nice if the branch shortening for these stubs could be done as some fixpoint iteration. Or take part when laying out the blocks, as maybe even more branches could be shortened if all the BarrierStubs were not at the very end.
Thanks for the reviews.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16780#issuecomment-1824365630
More information about the hotspot-dev
mailing list