RFR: 8255397: x86: coalesce reference and int entry points into vtos bytecodes
Aleksey Shipilev
shade at openjdk.java.net
Wed Oct 28 08:20:27 UTC 2020
On Tue, 27 Oct 2020 19:46:29 GMT, Claes Redestad <redestad at openjdk.org> wrote:
>> It rubs me the wrong way that we are effectively changing `push_ptr` to `push_i` for `aep`. While it is implemented in the same manner in `interp_masm_x86.cpp` -- delegating to `push`, it still means if `push_i` implementation changes, `aep` would do the `push_i` _as if_ it is integer, not pointer. Ditto a change in `push_ptr` (adding verification, maybe?) would miss this code.
>>
>> So, how much of the improvement we are talking about to sacrifice this?
>
>> It rubs me the wrong way that we are effectively changing `push_ptr` to `push_i` for `aep`. While it is implemented in the same manner in `interp_masm_x86.cpp` -- delegating to `push`, it still means if `push_i` implementation changes, `aep` would do the `push_i` _as if_ it is integer, not pointer. Ditto a change in `push_ptr` (adding verification, maybe?) would miss this code.
>
> Verification is done explicitly with `__ verify_oop(..)` and friends, so it seems unlikely we'll overload `push_ptr` any time soon (and they have been semantically identical for many years, even before the merging of 32- and 64-bit `interp_masm_x86...`). But I acknowledge this adds a fragility here, but perhaps there are some assertions we can add to put a check that `push_ptr` and `push_i` stays semantically the same?
>
>>
>> So, how much of the improvement we are talking about to sacrifice this?
>
> A few hundred thousand instructions and branches on Hello World (seems unconditional jumps are logged as branches by `perf`?):
>
> Baseline:
> 103,795,433 instructions # 0.59 insn per cycle ( +- 0.07% )
> 20,263,519 branches # 200.867 M/sec ( +- 0.08% )
> 731,187 branch-misses # 3.61% of all branches ( +- 0.15% ) 0.067306367 seconds time elapsed ( +- 0.24% )
>
> Patch:
> 103,466,523 instructions # 0.59 insn per cycle ( +- 0.07% )
> 20,068,162 branches # 201.935 M/sec ( +- 0.08% )
> 727,575 branch-misses # 3.63% of all branches ( +- 0.13% ) 0.066568115 seconds time elapsed ( +- 0.27% )
>
> For Hello World maybe half of that comes from reduced overhead of generating, the rest from quickening quite a few bytecode transitions. There's a scaling component (seen a few million instruction gains on slightly larger apps), but it's nothing huge.
Okay, so that is 0.3% less instructions and ~1% less branches on Hello World. That's interesting.
Would rebalancing the entry points order give the similar improvement without messing up the code? For example, what happens if we move `aep` to be the last entry point, and set up `[bcsi]ep` for a short jump?
There is a middle-ground, I think: introduce `push_i_or_ptr` and delegate it to `push`. That would make it clear what usages expect `push_i` and `push_ptr` shapes to match, and if later it proves to be a problem, we could easily revert all new usages to the old form.
-------------
PR: https://git.openjdk.java.net/jdk/pull/865
More information about the hotspot-dev
mailing list