RFR: 8279508: Auto-vectorize Math.round API [v15]
Quan Anh Mai
duke at openjdk.java.net
Tue Mar 22 03:17:32 UTC 2022
On Tue, 22 Mar 2022 02:52:07 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:
>>> A read from constant table will incur minimum of L1I access penalty to access code blob or at worst even more if data is not present in first level cache
>>
>> But your approach comes at a cost of frontend bandwidth and port contention, which imo are more important than latency in this case since a constant load does not prolong dependency chains. A load has very good throughput so it is often performant unless the load depends on its input (the memory location or the registers used for address calculation). Thanks
>
> Thanks for going into details, multicycle memory load will also defer dispatch of dependent instructions to execution port, port congestion becomes bottleneck when multiple ready instructions cannot be issued due to lack of execution resource or throughput constraints imposed by instruction, but a single cycle dependency chain may still win over latency due to pending memory operations.
I think I get it now, thanks a lot for your detailed explanation.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7094
More information about the core-libs-dev
mailing list