RFR: 8365926: RISC-V: Performance regression in renaissance (chi-square) [v2]
Hamlin Li
mli at openjdk.org
Thu Sep 11 09:03:58 UTC 2025
On Wed, 3 Sep 2025 09:47:07 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
>>> But the AbstractICache::invalidate_range is not documented to guarantee to have this effect.
>>
>> what "not documented" here mean? By reading the code, seems `AbstractICache::invalidate_range` will delegate to `icache_flush` in riscv which will do the fence and flush.
>>
>> BTW, here are some comments from hotspot/share/runtime/icache.hpp,
>>
>> // Default implementation is in icache.cpp, and can be hidden per-platform.
>> // Most platforms must provide only ICacheStubGenerator::generate_icache_flush().
>>
>>
>>> If someone executes the new instruction when changed to jalr(3), we did want them to call the new location we stored to the stub(1). By saying 1 happens before 3, we convey our intent.
>>> Aarch64 also have this.
>>
>> Make sense!
>> In worst condition, what will happen if we remove the 2 release here and just count on `fence rw, rw` in `AbstractICache::invalidate_range`? Seems we're fine based on your latter comment.
>> I suppose these extra 2 releases bring some performance penalty? If this is true, I'm not sure if it's worth to treat such a rare condition in such a proper way.
>
>> > But the AbstractICache::invalidate_range is not documented to guarantee to have this effect.
>>
>> what "not documented" here mean? By reading the code, seems `AbstractICache::invalidate_range` will delegate to `icache_flush` in riscv which will do the fence and flush.
>>
>> BTW, here are some comments from hotspot/share/runtime/icache.hpp,
>>
>> ```
>> // Default implementation is in icache.cpp, and can be hidden per-platform.
>> // Most platforms must provide only ICacheStubGenerator::generate_icache_flush().
>> ```
>
> Yes, and it doesn't say this method also provide a release fence or anything like that.
> I other general code we seem to needed, I can remove release(4) for a comment if you like.
>
>>
>> > If someone executes the new instruction when changed to jalr(3), we did want them to call the new location we stored to the stub(1). By saying 1 happens before 3, we convey our intent.
>> > Aarch64 also have this.
>>
>> Make sense! In worst condition, what will happen if we remove the 2 release here and just count on `fence rw, rw` in `AbstractICache::invalidate_range`? Seems we're fine based on your latter comment. I suppose these extra 2 releases bring some performance penalty? If this is true, I'm not sure if it's worth to treat such a rare condition in such a proper way.
>
> Yes, we should be fine, but there is no reason to not store them in 'wish' order.
> No there is no perfomance differences, this code is not executed often and the call to invalidate_range is so slow that anything else don't matter. You are talking about removing a few cycles from something that take tens of thousands of cycles.
I think we'd better to remove the code which is not necessary, in the sense of performance and readability.
If needed, we can add some comments here instead.
Otherwise the change looks good to me. Thanks!
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26944#discussion_r2339594340
More information about the hotspot-compiler-dev
mailing list