RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v6]
Coleen Phillimore
coleenp at openjdk.org
Mon Apr 29 21:16:07 UTC 2024
On Sun, 28 Apr 2024 07:26:48 GMT, Dean Long <dlong at openjdk.org> wrote:
>> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1787:
>>
>>> 1785: lea(cache, Address(cache, index));
>>> 1786: // Prevents stale data from being read after the bytecode is patched to the fast bytecode
>>> 1787: membar(MacroAssembler::LoadLoad);
>>
>> if we put LoadLoad in here, maybe it is redundant for TemplateTable::patch_bytecode(..)
>>
>> https://github.com/openjdk/jdk/blob/a08af97fda06e785d1c3a5a17e562e65150bbb07/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp#L192
>>
>> because there has a Load-acquire in L199.
>> can we change
>> `load_field_entry(Register cache, Register index, int bcp_offset = 1) `
>> to
>> `load_field_entry(Register cache, Register index, int bcp_offset = 1, bool needLoadLoad = 1) `?
>>
>> then change L192 like this
>> <pre>
>> @@ -189,7 +189,7 @@ void TemplateTable::patch_bytecode(Bytecodes::Code bc, Register bc_reg,
>> // additional, required work.
>> assert(byte_no == f1_byte || byte_no == f2_byte, "byte_no out of range");
>> assert(load_bc_into_bc_reg, "we use bc_reg as temp");
>> - __ load_field_entry(temp_reg, bc_reg);
>> + __ load_field_entry(temp_reg, bc_reg, 1 /*bcp_offset*/, false /*needLoadLoad*/);
>> if (byte_no == f1_byte) {
>> __ lea(temp_reg, Address(temp_reg, in_bytes(ResolvedFieldEntry::get_code_offset())));
>> } else {
>> __ lea(temp_reg, Address(temp_reg, in_bytes(ResolvedFieldEntry::put_code_offset())));
>> }
>> // Load-acquire the bytecode to match store-release in ResolvedFieldEntry::fill_in()
>> __ ldarb(temp_reg, temp_reg);
>> </pre>
>
> I believe patch_bytecode only needs a LoadStore between reading the cache bytecode and patching at bcp[0], so I would agree the LoadLoad in load_field_entry is not needed here. But the reason is not because we have already load-acquire. The reason is because there are not two loads that need ordering. There is only a load and a store. We can't replace the load-acquire with a LoadLoad, for example.
The LoadLoad in load_field_entry is needed for the second thread that (t2-1) loads the patched bytecode, and then (t2-2) loads the pointer to the cpCache entry for field, then (t2-3) loads the values from the cpCache entry pointer. The LoadLoad is between t2-2 and t2-3.
The ldarb in patch_bytecode does make the LoadLoad in load_field_entry redundant as pointed out by @sunny868 but seems harmless in terms of performance and correctness, ie. not needing a special parameter.
The reason that it's in load_field_entry is that there are other callers who do not load acquire the get_code and put_code fields. So it's safer and less error prone to forget with it in load_field_entry.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1583783084
More information about the hotspot-compiler-dev
mailing list