RFR: 8254693: Add Panama feature to pass heap segments to native code [v5]

Thu Oct 19 21:54:57 UTC 2023

On Thu, 19 Oct 2023 12:58:39 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:

>>> I wonder if the native_invoker_size_per_arg thing still works good enough. We may exceed the computed size, now, right?
>> 
>> Good point. I'll have a look at enhancing the test we have for this. 
>> 
>> Intuitively, I think it will be okay. It's true that we generate more code to add the oops and offsets together, but at the same time, we don't have any code to shuffle the offsets.
>
>> > I wonder if the native_invoker_size_per_arg thing still works good enough. We may exceed the computed size, now, right?
>> 
>> Good point. I'll have a look at enhancing the test we have for this.
>> 
>> Intuitively, I think it will be okay. It's true that we generate more code to add the oops and offsets together, but at the same time, we don't have any code to shuffle the offsets.
> 
> Looks like we use 2 input regs per obj, so we reserve 2x native_invoker_size_per_arg. However, the case in which `reg_oop` and `reg_offset` are on stack together with `arg_shuffle` can produce more than this size (depending on platform). If we pass lots of objects on stack, we may exceed the computed size. Correct?

> @TheRealMDoerr Note that `reg_offset` is filtered out, and not handled by `arg_shuffle`. So, it becomes the question whether shuffling a register, or adding an offset to an oop takes more bytes. I think most of the `native_invoker_size_per_arg` have some lenience built in though? (I did do that for x64 and aarch64 at least). So, I think it will be okay.

Right.
The filtering happens after the size computation `code_size = native_invoker_code_base_size + (num_args * native_invoker_size_per_arg)`, so we reserve 2 * native_invoker_size_per_arg per heap segment which is 2 * 2 instructions on PPC64.
The maximum actual code size is the size of the 1 * add_offset operation + 1 arg shuffle, which is 4 + 2 instructions on PPC64.
So, we reserve space for 4 instructions, but emit 6 ones.
The reason why it still works is that `_needs_transition` must be false and that saves more space than we exceeded by the oversized argument handling code.
Not a very nice design, but I can live with it.

> I've added an additional test case to the existing test for this, which should stress the new code gen. ([dd9e974](https://github.com/openjdk/jdk/commit/dd9e9741de3ca07e6a4cc561002255f98e1e3330))

Thanks for adding it! Would you mind using `limit(84)` which is the maximum? We'd be on the safe side when testing this.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1771757690