RFR: Store cpu features in AOTCodeCache header [v2]
Radim Vansa
rvansa at openjdk.org
Tue Jul 15 07:40:05 UTC 2025
On Mon, 14 Jul 2025 18:48:26 GMT, Ashutosh Mehra <asmehra at openjdk.org> wrote:
>> I can't find the logs from when I was investigating the issue, but AFAIR https://github.com/openjdk/crac/pull/103 was motivated by a bug that happened in compiler thread; it was going through some code that calculated buffer size for output code based on the availability of CPU features, and then it went to actually write down the instructions. When the checkpoint happened in the middle of this and the CPU got changed (we got a 'better' CPU) the decision in this codepath was changed, and resulted in a buffer overrun.
>> So it was rather a synchronization problem: some code was written assuming that the CPU features are runtime-constant, but these are not. There is certainly space for a better solution, but we would have to track through some complex code and make sure that it works on a 'snapshot' of features.
>
> @rvansa
>
>> it was going through some code that calculated buffer size for output code based on the availability of CPU features, and then it went to actually write down the instructions. When the checkpoint happened in the middle of this and the CPU got changed (we got a 'better' CPU) the decision in this codepath was changed, and resulted in a buffer overrun.
>
> I can imagine this happening in the context of checkpoint-snapshot but I don't think think AOTCodeCache can hit this issue of buffer overrun. Code generation is not suspended-resumed in Leyden workflow. When the code generation starts, it is always completed before the JVM exits the assembly phase.
>
>> So it was rather a synchronization problem: some code was written assuming that the CPU features are runtime-constant, but these are not. There is certainly space for a better solution, but we would have to track through some complex code and make sure that it works on a 'snapshot' of features.
>
> Other than this buffer overrun problem, have you come across any other code that relies on the assumption that CPU features are runtime-constant?
@ashu-mehra I don't recall that, but don't have too much personal experience switching the CPU between runs.
One issue that could do trouble is the hyperthreading support; I have one of those hybrid Intels with performance and efficiency cores. HT flag is on on the performance cores, and when you record the features on perf core and try to restore on Atom, a straightforward bitwise comparison would error, even though HT is not something the generated code usually relies on.
-------------
PR Review Comment: https://git.openjdk.org/leyden/pull/84#discussion_r2206675561
More information about the leyden-dev
mailing list