Save/load StubRoutines

Tue Jul 9 14:55:42 UTC 2024

Hi all,

We have been working for some time on storing and loading generated code
(stubs and blobs) in the hotspot as part of the "premain" project.
The code is now in a good enough shape to be shared with a wider audience
to get some feedback and comments.

So here is the link [0] [1] to the changes for storing and loading of stubs
which are declared in StubRoutines.
This patch covers both aarch64 and x86-64 architectures.
The code changes are done on top of AndrewD's patch for storing the blobs
[2].

---

A brief description of the approach for storing StubRoutines stubs is
warranted as it slightly differs from the technique used for storing other
runtime blobs.

In the mainline StubRoutines are divided into 4 categories depending on
when they are generated and their purpose - initial, continuation, compiler
and final
In comparison to a runtime blob which is stored in its own buffer, a
StubRoutine stub belongs to a category and all the stubs in a category are
stored in the same buffer.
This makes the code buffer storing StubRoutine to have multiple entry
points and these entry points are established as the stubs are generated
during the runtime.
Moreover, the generation of some stubs is dependent on the availability of
certain cpu features.

So when the buffer for a StubRoutine category is stored in the code cache,
some extra information needs to be store to be able to identify all the
entry points
and be able to associate them with the correct stubs when loading the
buffer.

To implement this each stub is given a unique static id irrespective of
whether its code is generated or not (refer to macro STUB_ROUTINES_STUBS_DO
in runtime/stubRoutines.hpp)
When the stubs are generated the entry point(s) of the stubs are stored in
an array (see StubArchiveData::_address_array in
runtime/stubCodeGenerator.hpp).
As the stubs are generated, the entry points are appended to the array.
Most of the stubs have only one entry point.
In addition to the entry point(s), the end address of the stub is also
recorded in the array.
The end address is used to create StubCodeDesc when the stubs are loaded
from the code cache.

To identify the addresses that belong to a stub, we store a tuple of 2
elements for each stub: first element is the index of the first entry point
of the stub in the _address_array
and second element is the number of addresses stored by this stub in the
_address_array (see StubAddrIndexInfo in runtime/stubCodeGenerator.hpp).
For stubs that are not generated, -1 is used for both the elements. These
tuples are stored in an array StubArchiveData::_index_table indexed by the
unique stub id.

It is easier to visualize this using a simple example:
Assume there are 3 stubs. Stub1 has 2 entry points (S1-1 and S1-2) and an
end address (E1). Stub2 is not generated. Stub3 has one entry point (S3-1)
and end address (E3).
For this case the _address_array and _index_table in the StubArchiveData
would have following entries:

_address_array:
index:          0          1      2       3       4
contents: | S1-1 | S1-2 | E1 | S3-1 | E3 |

_index_table:
index:          0        1        0
contents: | 0, 3 | -1, -1 | 3, 2 |

When all the stubs of a category are generated, the _address_array and
_index_table are stored in the code cache (in
SCCache::store_stubroutines_blob).
During load time the code along with the _address_array and _index_table is
read back from the code cache (in SCCReader::compile_stubroutines_blob).
The stubs entry points are set up in their respective routines that
generates the stub (such as generate_call_stub) using the _index_table.

As the stubs are generated, their entry points are also registered with the
SCAddressTable in SCCache.
To preserve the order of the SCAddressTable during load, the elements of
the _address_array are registered when the buffer is loaded (in
SCCReader::compile_stubroutines_blob).

In addition, StubArchiveData also stores the name of the stubs which is
used to verify the code located in the code cache is indeed for the stub
being loaded.

---

Apart from implementing the above approach, this patch also has changes to
move JFR stubs and throw_exception stubs to SharedRuntime and stores/loads
them as a RuntimeStub,
as they seem to be a better fit there.
There are also some minor improvements to logging to trace the stubs
generation and store/load times.

Lastly, the generated code (stubs and blobs) can be controlled using these
two options - StoreStubs and LoadStubs.

Please review the code and provide your feedback. Let us know if any
clarification is required.

There is still scope of improvement. Some of the things that can be done
are:
- In current scheme if a stub is not generated during training run, but
gets generated during the production run, the order of SCAddressTable can
vary resulting in crash/unexpected behavior
- Enum used to enumerate stubs can be improved. AndrewD has the idea of a
global enum that identifies all the stubs/blobs and that can address this
aspect of the patch and the limitation in the previous point as well.
- Exploring other ways to handle external constant data such as tables used
by trigonometric stubs which need to be registered in SCAddressTable very
early. See StubRoutines::stubs_SCAddressTable_init).
- Using InternalAddress (or something similar) relocation type for targets
that are in the same blob, so that they don't need to be fixed on load.

[0] change sets: https://github.com/adinn/leyden/compare/d808ea2..7738ed6
[1] branch: https://github.com/adinn/leyden/commits/premain-stub-routines/
[2] https://github.com/adinn/leyden/commits/premain-save-generated/

Thanks,
- Ashutosh Mehra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/leyden-dev/attachments/20240709/d2c4cee6/attachment.htm>