RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v6]
Fei Yang
fyang at openjdk.org
Mon Aug 5 03:00:42 UTC 2024
On Fri, 26 Jul 2024 08:10:01 GMT, Hamlin Li <mli at openjdk.org> wrote:
>> Hi,
>> Can you help to review the patch?
>>
>> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
>>
>> Thanks.
>>
>> ## Test
>> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32)
>>
>> I've tried several implementations, respectively with vector group
>> * m2+m1+scalar
>> * m2+scalar
>> * m1+scalar
>> * pure scalar
>> The best one is combination of m2+m1, it have best performance in all source size.
>>
>> ### K230
>>
>> this implementation (m2+m1)
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
>> -- | -- | -- | -- | -- | -- | -- | -- | --
>> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
>> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
>> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
>> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
>> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
>> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
>> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
>> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
>> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
>> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
>> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
>>
>> </google-sheets-html-origin>
>>
>> vector with only m2
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 4...
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
>
> - merge master
> - Merge branch 'master' into baes64-encode-integrated
> - move label
> - refine code
> - use pure scalar version when rvv is not supported
> - clean code
> - Initial commit
Some minor comments.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5201:
> 5199:
> 5200: Register codec = c_rarg6;
> 5201: Register length = c_rarg7; // total length of src data in bytes
Seems better to keep these aligned:
Register src = c_rarg0;
Register soff = c_rarg1;
Register send = c_rarg2;
Register dst = c_rarg3;
Register doff = c_rarg4;
Register isURL = c_rarg5;
Register codec = c_rarg6;
Register length = c_rarg7; // total length of src data in bytes
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5204:
> 5202:
> 5203: Label ProcessData, Exit, Greater;
> 5204:
Better to build a frame for proper stackwalking of RuntimeStub frame. Like:
__ enter(); // Required for proper stackwalking of RuntimeStub frame
// Your code here.
__ leave(); // Required for proper stackwalking of RuntimeStub frame
__ ret();
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5206:
> 5204:
> 5205: RegSet saved_regs;
> 5206: __ push_reg(saved_regs, sp);
But `saved_regs` is empty?
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5211:
> 5209: __ sub(length, send, soff);
> 5210: __ mv(t0, 3);
> 5211: __ div(length, length, t0);
Could this `div` be avoided? Maybe we could simply check and update `length` at the byte granunarity.
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6224:
> 6222: if (UseBASE64Intrinsics) {
> 6223: StubRoutines::_base64_encodeBlock = generate_base64_encodeBlock();
> 6224: }
Nit: put a new line after this if block.
-------------
PR Review: https://git.openjdk.org/jdk/pull/19973#pullrequestreview-2217446699
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1703473913
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1703084532
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1703451782
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1703483528
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1703082172
More information about the hotspot-dev
mailing list