[External] : Re: Save/load StubRoutines

Mon Jul 15 21:35:15 UTC 2024

On 7/15/24 8:34 AM, Andrew Dinn wrote:
> On 11/07/2024 18:05, Vladimir Kozlov wrote:
>> One issue I found recently is that we need to always generate cached
>>  code file even if there are no cached nmethods. I use saved stubs
>> code for that. May be we should select one stub or blob which is not
>> under LoadStubs/StoreStubs. May be temporary until we start caching
>> adapter.
> 
> I think the original x86 code that saved and restored generated routines
> was saving multiplyToLen, squareToLen and mulAdd, each in their own
> blob. There are now equivalent save/restore points for aarch64. We could
> always save one of them irrespective of the setting of StoreStubs.

Yes, that is what I already did:

https://github.com/openjdk/leyden/commit/d47557adf843bea792ca45b45502ba8748e0c6ad

But they are used for C2 only. We need something general.
How about `generate_forward_exception()`? It is first general stub generated on both platforms.

> 
>> LoadStubs should also depends on presence of cached code archive.
>>
>>> Apart from implementing the above approach, this patch also has
>> changes to move JFR stubs and throw_exception stubs to
>>> SharedRuntime and stores/loads them as a RuntimeStub, as they seem
>>> to
>> be a better fit there.
>>
>> This should go into mainline (first RFE). I see that it takes
>> majority of changed lines in changeset.
> 
> I'll link A JIRA for this as a subtask when I raise an over-arching
> cleanup JIRA for mainline. Likewise for the next two comments

Thank you.

> 
>> There are changes to use InternalAddress instead of ExternalAddress.
>>  This should go into mainline too (second RFE).
> 
> As above.
> 
>> Moved SHA and BASE64 tables in stubGenerator_aarch64.cpp into
>> mainline (third RFE).
> 
> As above.
> 
>> Can you explain next change in arraycopy stubs on aarch64?: __
>> b(RuntimeAddress(byte_copy_entry)); __ b(byte_copy_entry);
> 
> The branch here is internal to the blob containing the arraycopy entries
> and sub-entries. So, we don't actually need to use a relocation here. A
> branch to a raw address within the blob can and will always be encoded using a PC-relative jump.
> 
> I wanted to do this for all internal branches. It allows us accumulate
> entries for all routines in a given blob and publish them to the address
> table at the end of generation. Likewise, we can accumulate entry
> addresses while loading routines from a blob and publish at the end. The
> basic advantage is that it simplifies the publication process since
> publications happens all at once rather than routine by routine.
> However, it also makes error handling easier. If we encounter a
> read/store error or configuration mismatch when saving or loading a
> specific runtime blob we can regenerate the routines without referencing
> the cache and publish the addresses for use by later blobs without
> having to back entries out of the address table.

This sounds good.

> 
> Unfortunately, we are still using a RuntimeAddress to branch within
> the same code buffer on both x86 and aarch64. This only happens for one published entry on AArch64 (zero_blocks). This 
> entry gets called from
> copy routines within its own blob and also from the compilers. In both
> cases the call must be preceded by some extra logic so both the
> generator and compiler rely on the macro assembler to plant the preamble
> and call. If we tweak the macro assembler to export a variant of the
> API method that allows the call to be planted with a direct address
> rather than a RuntimeAddress then this can be fixed.

Can we have some kind of wrapper for case when we need RuntimeAddress?
May be have it in separate blob to not affect your implementation.

> 
> On x86 I think Ashu faced the problem that there was that there was no way to plant a call to a raw address. There is 
> the option of using an InternalAddress but that only works for loads and stores of data addresses.
> 
> I think the above problem was also what led Ashu to use RuntimeAddress (rather than ExternalAddress) to identify the 
> no_overlap target for the array copy stubs. That works, but only because we publish entries for each routine to the 
> address table immediately after generating the routine. I agree that we probably need t fix this with something like a 
> StubAddress class as a way of identifying a call target within the current stub that does not require a relocation. It 
> should be easy to verify that the target is actually in the current code buffer and reachable using a local jump. 
> Perhaps we need to add a similar Address subtype to AArch64? As you say this can be introduced as a mainline cleanup.

Yes, please do.

> 
>> And why is this?: #if 0 // This block of code is moved to StubRoutines::x86::init_SCAddressTable() 
>> StubRoutines::x86::generate_CRC32C_table(supports_clmul); #endif
> 
> As Ashu said, this is to unify the process of initializing stub routine data so that it can be published at init time. I 
> think we need to add some explicit initialization process to mainline as part of the stub generation cleanup which we 
> can then extend in Leyden to also do address table publication.
> 
> I'll start raising JIRA issues for these problems against mainline  as soon as possible.

Thank you, Andrew

Vladimir K

> 
> regards,
> 
> 
> Andrew Dinn
> -----------
>