Save/load StubRoutines

Mon Jul 15 15:34:29 UTC 2024

On 11/07/2024 18:05, Vladimir Kozlov wrote:
> One issue I found recently is that we need to always generate cached
>  code file even if there are no cached nmethods. I use saved stubs
> code for that. May be we should select one stub or blob which is not
> under LoadStubs/StoreStubs. May be temporary until we start caching
> adapter.

I think the original x86 code that saved and restored generated routines
was saving multiplyToLen, squareToLen and mulAdd, each in their own
blob. There are now equivalent save/restore points for aarch64. We could
always save one of them irrespective of the setting of StoreStubs.

> LoadStubs should also depends on presence of cached code archive.
> 
>> Apart from implementing the above approach, this patch also has
> changes to move JFR stubs and throw_exception stubs to
>> SharedRuntime and stores/loads them as a RuntimeStub, as they seem
>> to
> be a better fit there.
> 
> This should go into mainline (first RFE). I see that it takes
> majority of changed lines in changeset.

I'll link A JIRA for this as a subtask when I raise an over-arching
cleanup JIRA for mainline. Likewise for the next two comments

> There are changes to use InternalAddress instead of ExternalAddress.
>  This should go into mainline too (second RFE).

As above.

> Moved SHA and BASE64 tables in stubGenerator_aarch64.cpp into
> mainline (third RFE).

As above.

> Can you explain next change in arraycopy stubs on aarch64?: __
> b(RuntimeAddress(byte_copy_entry)); __ b(byte_copy_entry);

The branch here is internal to the blob containing the arraycopy entries
and sub-entries. So, we don't actually need to use a relocation here. A
branch to a raw address within the blob can and will always be encoded 
using a PC-relative jump.

I wanted to do this for all internal branches. It allows us accumulate
entries for all routines in a given blob and publish them to the address
table at the end of generation. Likewise, we can accumulate entry
addresses while loading routines from a blob and publish at the end. The
basic advantage is that it simplifies the publication process since
publications happens all at once rather than routine by routine.
However, it also makes error handling easier. If we encounter a
read/store error or configuration mismatch when saving or loading a
specific runtime blob we can regenerate the routines without referencing
the cache and publish the addresses for use by later blobs without
having to back entries out of the address table.

Unfortunately, we are still using a RuntimeAddress to branch within
the same code buffer on both x86 and aarch64. This only happens for one 
published entry on AArch64 (zero_blocks). This entry gets called from
copy routines within its own blob and also from the compilers. In both
cases the call must be preceded by some extra logic so both the
generator and compiler rely on the macro assembler to plant the preamble
and call. If we tweak the macro assembler to export a variant of the
API method that allows the call to be planted with a direct address
rather than a RuntimeAddress then this can be fixed.

On x86 I think Ashu faced the problem that there was that there was no 
way to plant a call to a raw address. There is the option of using an 
InternalAddress but that only works for loads and stores of data addresses.

I think the above problem was also what led Ashu to use RuntimeAddress 
(rather than ExternalAddress) to identify the no_overlap target for the 
array copy stubs. That works, but only because we publish entries for 
each routine to the address table immediately after generating the 
routine. I agree that we probably need t fix this with something like a 
StubAddress class as a way of identifying a call target within the 
current stub that does not require a relocation. It should be easy to 
verify that the target is actually in the current code buffer and 
reachable using a local jump. Perhaps we need to add a similar Address 
subtype to AArch64? As you say this can be introduced as a mainline cleanup.

> And why is this?: #if 0 // This block of code is moved to 
> StubRoutines::x86::init_SCAddressTable() 
> StubRoutines::x86::generate_CRC32C_table(supports_clmul); #endif

As Ashu said, this is to unify the process of initializing stub routine 
data so that it can be published at init time. I think we need to add 
some explicit initialization process to mainline as part of the stub 
generation cleanup which we can then extend in Leyden to also do address 
table publication.

I'll start raising JIRA issues for these problems against mainline  as 
soon as possible.

regards,

Andrew Dinn
-----------