RFR: 8312569: RISC-V: Missing intrinsics for Math.ceil, floor, rint

Wed Aug 2 09:56:42 UTC 2023

On Wed, 26 Jul 2023 09:56:03 GMT, Ilya Gavrilin <duke at openjdk.org> wrote:

>> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4283:
>> 
>>> 4281:   // generating constant (tmp2)
>>> 4282:   // tmp2 = 100...0000
>>> 4283:   addi(mask, zr, 1);
>> 
>> There are other ways to [implement these functions with less instructions](https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1AB9U8lJL6yAngGVG6AMKpaAVxYM9DgDJ4GmADl3ACNMYhAAVgBmUgAHVAVCWwZnNw89eMSbAV9/IJZQ8OiLTCtshiECJmICVPdPLhKy5MrqglzAkLDImIUqmrr0xr62jvzCnoBKC1RXYmR2DgBSACYov2Q3LABqJajHPvxBADoEPewljQBBVfWGTdcdvccWJgIEU/PLm%2BuCAE9YpgsFRtolgP50NtkAhqttgqgXHsAELfb7oWbBejfMyYOjbCDo1yYzDbVSTVEAdhR12220JxO2r2AuyiABFtlQmMEFPiycjUTTtngQRAmSzHPiuHJvN5xc9tqZWUjvMYALJXAIAFWMrIAkgBxUyTcmC2lLKnfWlWoWCABskmMBCF/NNVvp9G2/gA7sZVC6fldrbSmAoWNsAG4uN50EkQVYrKjIcMEY60Y6Q1YRDSkXYrCKNbbEVyxePbEC5lZ7VnEUsQPCTMsVqi1snkqLUwNBkNhyMGGweuMrBNJlPoVO5rM5zMFosloeN%2BNV5vziDe30N8vxmsr%2BttjuWoMAegAVFDcbQIABaDQNqtV7bXnOGSF%2BbbARhhMTbL2YMAcWi0NsYhekwfw8jC4Yku8JIKKwmACgAnKCeDgkBPKzMQNqxK4Ka0seh4HtaqgsuyaCxGBKEMPia6qDmfLtoRuwUqyAqdsQmAEHMVF%2Bgx1zmixfHXO68HXKYVC0AimEEhiHp8nxFqCsJjJMMy96ctyvJ7qxtLCviYryhAUoynK%2BwKjqypqhq2p6oaxjGox5odkGN
 oEPajrOrxnbWkpNH%2BoxwahhGUb9rG8aJsmqbphO2YTjO6AMKWm5DlW24rPi9YLkOy5pRArb%2Bs53ZBX2Mb4mFI7puOmYxdOObEPFiUVkutY0RuFapel9mef52wnhyEkkFeN4kfej5AQwL5Ue%2B/jEF%2BP5/gBQG0CBYHbBBUEIDBcGIchqEhtsGFYThxx4QRrq0sR95kRRqGrpgPq0aSWmuvx2mFhxXGkn5zGsWiMkiVcpjELM434kpck3ApnZKfpbIclyPK5U9AY6SKMMSoZ0qys88qKhZ6pajqBpGiaXlMU5QZ%2BK5DpOng%2BVBj5d2%2Bn5Z1oT2wUlYOw4RWmGZ5tVeYziwLANYubLtXWrVhS2nXk9ahW9tGA5lRFY6AVVU4C7VQsi8lbLZdRjNkplKzi7uzOk7SvVA6442DbebIjTFz42m%2BH4zYBc3/oBwGgeBTCQds0GgltglIWCVF7QdfjYbh2z4d1F2w1d4cG/ddFI9aL2CWx73ENxX0CVcHDTLQnARLwngcFopCoJwEoKBhCy5lEPCkAQmjF9MADWkTZqXHCSLwLASBo2aV9XtccLwCggNm7dV8XpBwLASBoCwsQxmQFAQGvG/0OExDhval7IJshjAKQWDhngCwAGp4HdADygKV63NC0AQYQzxAwQd6QwR%2BGqH8TgrcAHMGIH8R%2BwRtCYGsCA3ga82CCEfgwWgwCF6X0wMEVwwBHBiFoDPbgvAsCvCMOIDB%2BB2LWDwJBQh1dMCqFgThRYrdKalD/rQPAwQZoQOcFgP%2BBBiB4GHkQ6Y4kVIKHvk/F%2B8CZCCBEGIdgUg5HyCUGoP%2BuhGgGCMCAUwxhzCcO5PAaYqBYjlEIZeR%2BUReCoEgsQIRWAZ6QGmJYWB5R7DjUGJ4KI2YfB%2BE6AUboXAVhxASEkAQXiMhhPKGMLoRQmhuJaP0WoLh6ggB8Qk6hAhWg1FiYE8IwSLDJMicMZJe
 SJgRBcY3JRJcy4Vz/pPbYAAlXUQhHCXlvoWI%2BkhgDIChNo5kEBBE2y7g2CAuBCAkGblwSYvB55aGNKQDaTAsDhAgN3Xu%2BhOCD1IOPGxnBp6zzbh3RZ/cVj1IwZPOZJzph2MSHYSQQA). Also, given you're not checking for NaN (and neither is it done for other architectures), I assume that this is done before this is called?
>
> Hi, thanks for your review. 
> 
> - About NaN, INF and other special values:
> 
> According to RISC-V ISA paragraph 8.7 for fcvt.l instruction (Table 8.4) we have some special return values: 
> 1) if we exceed minimum input value, or  got -INF it returns (-2^63)
> 2) if we exceed maximum input value, or got +INF or NaN it returns (2^63-1)
> 
> (Also, if we exceed maximum/minimum input values on input we have already integer value, 
> because according to IEEE754 double-precision f.p. format all doubles more +/- 2^52 are already integer values)
> 
> So we need to check if we got -2^63 or 2^63-1 after double->long int conversion,
> comment lines from [src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:4288](https://github.com/openjdk/jdk/pull/14991/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR4288-R4291) describes how do we change result (converted_dbl -> converted_dbl_masked) and constant (mask) to check this values. I have tried to check for NaN and +-inf with just one conditional branch.
> If we got -2^63 or 2^63-1 we return input value, therefore NaN -> NaN; +/- INF -> +/- INF; double that is already integer stays same as required by the ceil/floor/rint function descriptions.
> 
> - About case when we can use less instructions:
> 
> Of course, we can use a bit less instructions, but the main goal during intrinsic writing was minimizing count of expensive instructions (so instead of flt.d was used integer instructions on converted_dbl etc.)
> We already have some cases when less expensive instructions were chosen instead of reducing their number. For example: https://bugs.openjdk.org/browse/JDK-8297359

Generally speaking, I'm a bit concerned we are over-optimizing for HiFive Unmatched (or equivalent "small" boards). I understand this is what we have today, and that there is no great way to project performance otherwise (maybe [llvm-mca](https://www.llvm.org/docs/CommandGuide/llvm-mca.html) but I haven't played with it), I just wish there were already broadly available high-performance RISC-V board (soon...)

Happy to go with that approach for now, and we can revisit the specific instruction sequence when we have more powerful boards to benchmark with.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14991#discussion_r1281677022