<div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">LoadStubs should also depends on presence of cached code archive.</blockquote><div><br></div><div>I am not sure I get this. I already have this check in CDSConfig::check_vm_args_consistency():</div><div><br></div><div> if (!LoadCachedCode && LoadStubs) {<br> warning("LoadStubs will be ignored without LoadCachedCode");<br> LoadStubs = false;<br> }<br></div><div><br></div><div>and SCCache::load_XXX() routines have this check:</div><div><br></div><div> SCCache* cache = open_for_read();<br> if (cache == nullptr) {<br> return false;<br> }<br></div><div><br></div><div>Wouldn't these checks be sufficient?</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Can you explain next change in arraycopy stubs on aarch64?:<br> __ b(RuntimeAddress(byte_copy_entry));<br> __ b(byte_copy_entry);</blockquote><div><br></div><div>Andrew Dinn can explain this change. I am guessing because byte_copy_entry is in the same CodeBuffer, we don't need to emit any relocation?</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">And why is this?:<br>#if 0<br> // This block of code is moved to StubRoutines::x86::init_SCAddressTable()<br> StubRoutines::x86::generate_CRC32C_table(supports_clmul);<br>#endif<br></blockquote><div> </div><div>I moved the call that generates StubRoutines::x86::_crc32c_table to init_SCAddressTable so as to club its registration together with all other constants.</div><div>I forgot to remove this block of code. I will clean it up.</div><div><br></div><div>Another thing worth pointing out (which may have got overlooked in the large patch) is this change in StubGenerator::array_overlap_test (in stubGenerator_x86_64_arraycopy.cpp)</div><div><br></div><div> if (NOLp == nullptr) {<br>- ExternalAddress no_overlap(no_overlap_target);<br>+ RuntimeAddress no_overlap(no_overlap_target);<br> __ jump_cc(Assembler::belowEqual, no_overlap);<br> __ cmpptr(to, end_from);<br> __ jump_cc(Assembler::aboveEqual, no_overlap);<br></div><div><br></div><div>It seems like a bug to have ExternalAddress relocation for jump targets. If this change is reverted then debug build fails with following assertion failure:<br></div><div><br></div><div># Internal Error (/home/asmehra/data/ashu-mehra/leyden/src/hotspot/cpu/x86/assembler_x86.cpp:1009), pid=1392250, tid=1392251<br># assert(which == call32_operand) failed: jcc has no disp32 or imm<br></div><div><br></div><div>Same is the case in StubGenerator::generate_cont_thaw():<br></div><div><br></div><div>- __ jump(ExternalAddress(StubRoutines::throw_StackOverflowError_entry()));<br>+ __ jump(RuntimeAddress(StubRoutines::throw_StackOverflowError_entry()));<br></div><div><br></div><div>In this case assertion failure is:</div><div><br></div><div># Internal Error (/home/asmehra/data/ashu-mehra/leyden/src/hotspot/cpu/x86/assembler_x86.cpp:1171), pid=1398423, tid=1398424<br># assert(which == call32_operand) failed: call has no disp32 or imm<br></div><div><br></div><div>Mainline is oblivious to this as the relocations are not emitted for StubRoutines.</div><div><br></div><div>Thanks,</div><div>- Ashutosh Mehra<br></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Jul 11, 2024 at 1:06 PM Vladimir Kozlov <<a href="mailto:vladimir.kozlov@oracle.com" target="_blank">vladimir.kozlov@oracle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Great work, Ashutosh<br>
<br>
One issue I found recently is that we need to always generate cached code file even if there are no cached nmethods.<br>
I use saved stubs code for that. May be we should select one stub or blob which is not under LoadStubs/StoreStubs.<br>
May be temporary until we start caching adapter.<br>
<br>
LoadStubs should also depends on presence of cached code archive.<br>
<br>
> Apart from implementing the above approach, this patch also has changes to move JFR stubs and throw_exception stubs to<br>
> SharedRuntime and stores/loads them as a RuntimeStub, as they seem to be a better fit there.<br>
<br>
This should go into mainline (first RFE). I see that it takes majority of changed lines in changeset.<br>
<br>
There are changes to use InternalAddress instead of ExternalAddress. This should go into mainline too (second RFE).<br>
<br>
Moved SHA and BASE64 tables in stubGenerator_aarch64.cpp into mainline (third RFE).<br>
<br>
Can you explain next change in arraycopy stubs on aarch64?:<br>
__ b(RuntimeAddress(byte_copy_entry));<br>
__ b(byte_copy_entry);<br>
<br>
And why is this?:<br>
#if 0<br>
// This block of code is moved to StubRoutines::x86::init_SCAddressTable()<br>
StubRoutines::x86::generate_CRC32C_table(supports_clmul);<br>
#endif<br>
<br>
<br>
Thanks,<br>
Vladimir K<br>
<br>
<br>
On 7/9/24 7:55 AM, Ashutosh Mehra wrote:<br>
> Hi all,<br>
> <br>
> We have been working for some time on storing and loading generated code (stubs and blobs) in the hotspot as part of the <br>
> "premain" project.<br>
> The code is now in a good enough shape to be shared with a wider audience to get some feedback and comments.<br>
> <br>
> So here is the link [0] [1] to the changes for storing and loading of stubs which are declared in StubRoutines.<br>
> This patch covers both aarch64 and x86-64 architectures.<br>
> The code changes are done on top of AndrewD's patch for storing the blobs [2].<br>
> <br>
> ---<br>
> <br>
> A brief description of the approach for storing StubRoutines stubs is warranted as it slightly differs from the <br>
> technique used for storing other runtime blobs.<br>
> <br>
> In the mainline StubRoutines are divided into 4 categories depending on when they are generated and their purpose - <br>
> initial, continuation, compiler and final<br>
> In comparison to a runtime blob which is stored in its own buffer, a StubRoutine stub belongs to a category and all the <br>
> stubs in a category are stored in the same buffer.<br>
> This makes the code buffer storing StubRoutine to have multiple entry points and these entry points are established as <br>
> the stubs are generated during the runtime.<br>
> Moreover, the generation of some stubs is dependent on the availability of certain cpu features.<br>
> <br>
> So when the buffer for a StubRoutine category is stored in the code cache, some extra information needs to be store to <br>
> be able to identify all the entry points<br>
> and be able to associate them with the correct stubs when loading the buffer.<br>
> <br>
> To implement this each stub is given a unique static id irrespective of whether its code is generated or not (refer to <br>
> macro STUB_ROUTINES_STUBS_DO in runtime/stubRoutines.hpp)<br>
> When the stubs are generated the entry point(s) of the stubs are stored in an array (see StubArchiveData::_address_array <br>
> in runtime/stubCodeGenerator.hpp).<br>
> As the stubs are generated, the entry points are appended to the array. Most of the stubs have only one entry point.<br>
> In addition to the entry point(s), the end address of the stub is also recorded in the array.<br>
> The end address is used to create StubCodeDesc when the stubs are loaded from the code cache.<br>
> <br>
> To identify the addresses that belong to a stub, we store a tuple of 2 elements for each stub: first element is the <br>
> index of the first entry point of the stub in the _address_array<br>
> and second element is the number of addresses stored by this stub in the _address_array (see StubAddrIndexInfo in <br>
> runtime/stubCodeGenerator.hpp).<br>
> For stubs that are not generated, -1 is used for both the elements. These tuples are stored in an array <br>
> StubArchiveData::_index_table indexed by the unique stub id.<br>
> <br>
> It is easier to visualize this using a simple example:<br>
> Assume there are 3 stubs. Stub1 has 2 entry points (S1-1 and S1-2) and an end address (E1). Stub2 is not generated. <br>
> Stub3 has one entry point (S3-1) and end address (E3).<br>
> For this case the _address_array and _index_table in the StubArchiveData would have following entries:<br>
> <br>
> _address_array:<br>
> index: 0 1 2 3 4<br>
> contents: | S1-1 | S1-2 | E1 | S3-1 | E3 |<br>
> <br>
> _index_table:<br>
> index: 0 1 0<br>
> contents: | 0, 3 | -1, -1 | 3, 2 |<br>
> <br>
> When all the stubs of a category are generated, the _address_array and _index_table are stored in the code cache (in <br>
> SCCache::store_stubroutines_blob).<br>
> During load time the code along with the _address_array and _index_table is read back from the code cache (in <br>
> SCCReader::compile_stubroutines_blob).<br>
> The stubs entry points are set up in their respective routines that generates the stub (such as generate_call_stub) <br>
> using the _index_table.<br>
> <br>
> As the stubs are generated, their entry points are also registered with the SCAddressTable in SCCache.<br>
> To preserve the order of the SCAddressTable during load, the elements of the _address_array are registered when the <br>
> buffer is loaded (in SCCReader::compile_stubroutines_blob).<br>
> <br>
> In addition, StubArchiveData also stores the name of the stubs which is used to verify the code located in the code <br>
> cache is indeed for the stub being loaded.<br>
> <br>
> ---<br>
> <br>
> Apart from implementing the above approach, this patch also has changes to move JFR stubs and throw_exception stubs to <br>
> SharedRuntime and stores/loads them as a RuntimeStub,<br>
> as they seem to be a better fit there.<br>
> There are also some minor improvements to logging to trace the stubs generation and store/load times.<br>
> <br>
> Lastly, the generated code (stubs and blobs) can be controlled using these two options - StoreStubs and LoadStubs.<br>
> <br>
> Please review the code and provide your feedback. Let us know if any clarification is required.<br>
> <br>
> There is still scope of improvement. Some of the things that can be done are:<br>
> - In current scheme if a stub is not generated during training run, but gets generated during the production run, the <br>
> order of SCAddressTable can vary resulting in crash/unexpected behavior<br>
> - Enum used to enumerate stubs can be improved. AndrewD has the idea of a global enum that identifies all the <br>
> stubs/blobs and that can address this aspect of the patch and the limitation in the previous point as well.<br>
> - Exploring other ways to handle external constant data such as tables used by trigonometric stubs which need to be <br>
> registered in SCAddressTable very early. See StubRoutines::stubs_SCAddressTable_init).<br>
> - Using InternalAddress (or something similar) relocation type for targets that are in the same blob, so that they don't <br>
> need to be fixed on load.<br>
<br>
> <br>
> <br>
> [0] change sets: <a href="https://github.com/adinn/leyden/compare/d808ea2..7738ed6" rel="noreferrer" target="_blank">https://github.com/adinn/leyden/compare/d808ea2..7738ed6</a> <br>
> <<a href="https://github.com/adinn/leyden/compare/d808ea2..7738ed6" rel="noreferrer" target="_blank">https://github.com/adinn/leyden/compare/d808ea2..7738ed6</a>><br>
> [1] branch: <a href="https://github.com/adinn/leyden/commits/premain-stub-routines/" rel="noreferrer" target="_blank">https://github.com/adinn/leyden/commits/premain-stub-routines/</a> <br>
> <<a href="https://github.com/adinn/leyden/commits/premain-stub-routines/" rel="noreferrer" target="_blank">https://github.com/adinn/leyden/commits/premain-stub-routines/</a>><br>
> [2] <a href="https://github.com/adinn/leyden/commits/premain-save-generated/" rel="noreferrer" target="_blank">https://github.com/adinn/leyden/commits/premain-save-generated/</a> <br>
> <<a href="https://github.com/adinn/leyden/commits/premain-save-generated/" rel="noreferrer" target="_blank">https://github.com/adinn/leyden/commits/premain-save-generated/</a>><br>
> <br>
> Thanks,<br>
> - Ashutosh Mehra<br>
<br>
</blockquote></div>