RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v2]

Fri Jul 5 13:48:25 UTC 2024

On Thu, 4 Jul 2024 17:49:56 GMT, Camel Coder <duke at openjdk.org> wrote:

> For decode, I'm not really happy with any implementation. Yours uses multiple `vluxei8` + `vlsege4` + `vssege3`, the others from base64simd use LMUL=8 `vrgather.vv`, which will take `LMUL^2=8^2=64` times the amount of cycles a LMUL=1 `vrgather.vv` takes (on sane implementations, [see my reasoning](https://gitlab.com/riseproject/riscv-optimization-guide/-/issues/1#note_1977583125)). As I said, I'm fairly certain LMUL=1 `vrgather.vv` will have to be relatively fast, so if I had to choose, I'd prefer [my implementation](https://godbolt.org/z/hrs61x9aP) that uses LMUL=1 `vrgather.vv`s + `vlsege4` + `vssege3`, but using `vsseg*` is not ideal. (Note that gcc currently chokes on the register allocation, so you should use clang for now)

I import [your implementation](https://godbolt.org/z/hrs61x9aP) into jdk, but compared to my current decode implementation, it brings much regression.
Let's discuss about decode in https://github.com/openjdk/jdk/pull/20026.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19973#issuecomment-2210907011