RFR: 8378698: Optimize Base64.Encoder#encodeToString
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes ------------- Commit messages: - 8378698: Optimize Base64.Encoder#encodeToString Changes: https://git.openjdk.org/jdk/pull/29920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29920&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8378698 Stats: 5 lines in 1 file changed: 3 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29920/head:pull/29920 PR: https://git.openjdk.org/jdk/pull/29920
On Wed, 25 Feb 2026 15:37:02 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Hmm, I checked the code of that deprecated constructor. If the compiler can recognize `count == ascii.length` then it should be able to fold this code; maybe the compiler is too stupid. Have you verified the performance results in a benchmark? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3960288227
On Wed, 25 Feb 2026 15:58:22 GMT, Chen Liang <liach@openjdk.org> wrote:
Hmm, I checked the code of that deprecated constructor. If the compiler can recognize `count == ascii.length` then it should be able to fold this code; maybe the compiler is too stupid. Have you verified the performance results in a benchmark?
I did not run benchmarks, just saw that the constructor copies the array (obviously since it's public), and that we use JLA in other places to avoid the copy.
Turns out this is necessary per https://bugs.openjdk.org/browse/JDK-8364418
Yep I saw that as well, which is why I thought this would be beneficial. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3961737142
On Wed, 25 Feb 2026 20:08:19 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Hmm, I checked the code of that deprecated constructor. If the compiler can recognize `count == ascii.length` then it should be able to fold this code; maybe the compiler is too stupid. Have you verified the performance results in a benchmark?
Hmm, I checked the code of that deprecated constructor. If the compiler can recognize `count == ascii.length` then it should be able to fold this code; maybe the compiler is too stupid. Have you verified the performance results in a benchmark?
I did not run benchmarks, just saw that the constructor copies the array (obviously since it's public), and that we use JLA in other places to avoid the copy.
Turns out this is necessary per https://bugs.openjdk.org/browse/JDK-8364418
Yep I saw that as well, which is why I thought this would be beneficial.
@kilink Are you going to update the PR with the performance data and a summary of the testing that you have done? Are the benchmarks for Base64.Encoder in test/micro sufficient or will this PR propose to add more benchmarks? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3964695824
On Wed, 25 Feb 2026 15:37:02 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Turns out this is necessary per https://bugs.openjdk.org/browse/JDK-8364418 ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29920#pullrequestreview-3855286622
On Wed, 25 Feb 2026 15:37:02 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
@kilink Your change (at version d007ca46ac6910e2cda7b569716148e1fb30af14) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3961771818
On Wed, 25 Feb 2026 15:37:02 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
/integrate
Please be aware of https://openjdk.org/guide/#life-of-a-pr "Life of a PR" point 7, allow 24 hours for sufficient reviews before typing `/integrate` command. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3961821373
On Wed, 25 Feb 2026 15:37:02 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Ran tier 1-3 on linux x64 on Oracle's CI, all passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3962213304
On Wed, 25 Feb 2026 15:37:02 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Seconding Alan's sentiment here, contributing performance improvements without validation data is inadvisable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3965614026
On Wed, 25 Feb 2026 15:37:02 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
src/java.base/share/classes/java/util/Base64.java line 351:
349: */ 350: public String encodeToString(byte[] src) { 351: byte[] encoded = encode(src);
Consider adding a comment here to make it clear that `encoded` is guaranteed to be ASCII-only. Otherwise, using `uncheckedNewStringWithLatin1Bytes` would not be safe. Better to make this contract explicit with a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29920#discussion_r2858168126
On Thu, 26 Feb 2026 10:17:54 GMT, Eirik Bjørsnøs <eirbjo@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
src/java.base/share/classes/java/util/Base64.java line 351:
349: */ 350: public String encodeToString(byte[] src) { 351: byte[] encoded = encode(src);
Consider adding a comment here to make it clear that `encoded` is guaranteed to be ASCII-only.
Otherwise, using `uncheckedNewStringWithLatin1Bytes` would not be safe. Better to make this contract explicit with a comment.
`encoded` is already a well-behaved platform API, and this method already states:
In other words, an invocation of this method has exactly the same effect as invoking `new String(encode(src), StandardCharsets.ISO_8859_1)`.
And this is exactly compatible with `uncheckedNewStringWithLatin1Bytes`. Let's not add redundant comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29920#discussion_r2859802886
On Wed, 25 Feb 2026 15:37:02 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
I put together a JMH benchmark that tests `encodeToString` on various size inputs (100, 100, 10000), and here are the results running locally: Baseline Benchmark (inputSize) Mode Cnt Score Error Units Base64Encode.testBase64EncodeToString 100 thrpt 12 25830.222 ± 1515.781 ops/ms Base64Encode.testBase64EncodeToString 1000 thrpt 12 6864.086 ± 361.713 ops/ms Base64Encode.testBase64EncodeToString 10000 thrpt 12 413.448 ± 83.529 ops/ms This patch: Benchmark (inputSize) Mode Cnt Score Error Units Base64Encode.testBase64EncodeToString 100 thrpt 12 31614.860 ± 2434.919 ops/ms Base64Encode.testBase64EncodeToString 1000 thrpt 12 9134.750 ± 315.348 ops/ms Base64Encode.testBase64EncodeToString 10000 thrpt 12 706.133 ± 163.285 ops/ms Looks like an improvement across the board just looking at throughput, I didn't run an allocation rate benchmark but it seems fairly clear to me it would show an improvement. I can clean up the benchmark and add it to this PR if desired. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3968707787
On Wed, 25 Feb 2026 15:37:02 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Yes, please add this new benchmark method to `Base64Encode` benchmark. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3968757175
On Thu, 26 Feb 2026 19:33:29 GMT, Chen Liang <liach@openjdk.org> wrote:
Yes, please add this new benchmark method to `Base64Encode` benchmark.
Okay, it was a little easier to just add a new benchmark class rather than rework the existing one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3969030054
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision: Add JMH benchmark for encodeToString ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29920/files - new: https://git.openjdk.org/jdk/pull/29920/files/d007ca46..458236ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29920&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29920&range=00-01 Stats: 54 lines in 1 file changed: 54 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29920/head:pull/29920 PR: https://git.openjdk.org/jdk/pull/29920
On Thu, 26 Feb 2026 20:28:12 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision:
Add JMH benchmark for encodeToString
Apparently the new benchmark takes 33 mins to be fully run. I recommend reducing the iterations as done in https://bugs.openjdk.org/browse/JDK-8287810. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3969620680
On Thu, 26 Feb 2026 22:32:23 GMT, Chen Liang <liach@openjdk.org> wrote:
Apparently the new benchmark takes 33 mins to be fully run. I recommend reducing the iterations as done in https://bugs.openjdk.org/browse/JDK-8287810.
Whoops, I just used the defaults. I've changed it to be more in line with the other benchmarks. I've rerun it with the new settings and allocation information included, and this is what I get locally on an M1 Macbook. make test TEST="micro:Base64EncodeToString" MICRO='RESULTS_FORMAT=JSON;OPTIONS=-prof gc' baseline: Benchmark (inputSize) Mode Cnt Score Error Units Base64EncodeToString.testEncodeToString 10 thrpt 10 37077.450 ± 6720.486 ops/ms Base64EncodeToString.testEncodeToString:gc.alloc.rate 10 thrpt 10 3109.573 ± 566.545 MB/sec Base64EncodeToString.testEncodeToString:gc.alloc.rate.norm 10 thrpt 10 88.000 ± 0.001 B/op Base64EncodeToString.testEncodeToString:gc.count 10 thrpt 10 169.000 counts Base64EncodeToString.testEncodeToString:gc.time 10 thrpt 10 102.000 ms Base64EncodeToString.testEncodeToString 100 thrpt 10 44531.727 ± 2228.099 ops/ms Base64EncodeToString.testEncodeToString:gc.alloc.rate 100 thrpt 10 13928.036 ± 698.474 MB/sec Base64EncodeToString.testEncodeToString:gc.alloc.rate.norm 100 thrpt 10 328.000 ± 0.001 B/op Base64EncodeToString.testEncodeToString:gc.count 100 thrpt 10 233.000 counts Base64EncodeToString.testEncodeToString:gc.time 100 thrpt 10 151.000 ms Base64EncodeToString.testEncodeToString 1000 thrpt 10 8071.402 ± 334.834 ops/ms Base64EncodeToString.testEncodeToString:gc.alloc.rate 1000 thrpt 10 20996.105 ± 871.475 MB/sec Base64EncodeToString.testEncodeToString:gc.alloc.rate.norm 1000 thrpt 10 2728.000 ± 0.001 B/op Base64EncodeToString.testEncodeToString:gc.count 1000 thrpt 10 298.000 counts Base64EncodeToString.testEncodeToString:gc.time 1000 thrpt 10 191.000 ms Base64EncodeToString.testEncodeToString 10000 thrpt 10 605.186 ± 19.722 ops/ms Base64EncodeToString.testEncodeToString:gc.alloc.rate 10000 thrpt 10 15424.548 ± 502.632 MB/sec Base64EncodeToString.testEncodeToString:gc.alloc.rate.norm 10000 thrpt 10 26728.006 ± 0.001 B/op Base64EncodeToString.testEncodeToString:gc.count 10000 thrpt 10 275.000 counts Base64EncodeToString.testEncodeToString:gc.time 10000 thrpt 10 176.000 ms With patch: Benchmark (inputSize) Mode Cnt Score Error Units Base64EncodeToString.testEncodeToString 10 thrpt 10 74105.364 ± 2901.628 ops/ms Base64EncodeToString.testEncodeToString:gc.alloc.rate 10 thrpt 10 3957.248 ± 154.970 MB/sec Base64EncodeToString.testEncodeToString:gc.alloc.rate.norm 10 thrpt 10 56.000 ± 0.001 B/op Base64EncodeToString.testEncodeToString:gc.count 10 thrpt 10 201.000 counts Base64EncodeToString.testEncodeToString:gc.time 10 thrpt 10 139.000 ms Base64EncodeToString.testEncodeToString 100 thrpt 10 56614.125 ± 3686.741 ops/ms Base64EncodeToString.testEncodeToString:gc.alloc.rate 100 thrpt 10 9497.667 ± 618.182 MB/sec Base64EncodeToString.testEncodeToString:gc.alloc.rate.norm 100 thrpt 10 176.000 ± 0.001 B/op Base64EncodeToString.testEncodeToString:gc.count 100 thrpt 10 173.000 counts Base64EncodeToString.testEncodeToString:gc.time 100 thrpt 10 122.000 ms Base64EncodeToString.testEncodeToString 1000 thrpt 10 10738.733 ± 1765.562 ops/ms Base64EncodeToString.testEncodeToString:gc.alloc.rate 1000 thrpt 10 14090.662 ± 2316.935 MB/sec Base64EncodeToString.testEncodeToString:gc.alloc.rate.norm 1000 thrpt 10 1376.000 ± 0.001 B/op Base64EncodeToString.testEncodeToString:gc.count 1000 thrpt 10 200.000 counts Base64EncodeToString.testEncodeToString:gc.time 1000 thrpt 10 150.000 ms Base64EncodeToString.testEncodeToString 10000 thrpt 10 940.693 ± 91.054 ops/ms Base64EncodeToString.testEncodeToString:gc.alloc.rate 10000 thrpt 10 11998.463 ± 1161.641 MB/sec Base64EncodeToString.testEncodeToString:gc.alloc.rate.norm 10000 thrpt 10 13376.004 ± 0.001 B/op Base64EncodeToString.testEncodeToString:gc.count 10000 thrpt 10 158.000 counts Base64EncodeToString.testEncodeToString:gc.time 10000 thrpt 10 136.000 ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3970052161
On Thu, 26 Feb 2026 20:28:12 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision:
Add JMH benchmark for encodeToString
My results with this benchmark (yes, took me two of 33:25 to run!) Without patch: Benchmark (inputSize) Mode Cnt Score Error Units Base64EncodeToString.testEncodeToString 10 thrpt 25 60318.739 ± 912.624 ops/ms Base64EncodeToString.testEncodeToString 100 thrpt 25 34639.846 ± 564.989 ops/ms Base64EncodeToString.testEncodeToString 1000 thrpt 25 4760.465 ± 66.707 ops/ms Base64EncodeToString.testEncodeToString 10000 thrpt 25 416.867 ± 6.590 ops/ms With patch: Benchmark (inputSize) Mode Cnt Score Error Units Base64EncodeToString.testEncodeToString 10 thrpt 25 73585.221 ± 1250.220 ops/ms Base64EncodeToString.testEncodeToString 100 thrpt 25 37766.561 ± 991.396 ops/ms Base64EncodeToString.testEncodeToString 1000 thrpt 25 8198.071 ± 193.539 ops/ms Base64EncodeToString.testEncodeToString 10000 thrpt 25 725.249 ± 11.494 ops/ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3969795570
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision: Reduce warmup and measurement iterations for benchmark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29920/files - new: https://git.openjdk.org/jdk/pull/29920/files/458236ba..a2fa6d14 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29920&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29920&range=01-02 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/29920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29920/head:pull/29920 PR: https://git.openjdk.org/jdk/pull/29920
On Fri, 27 Feb 2026 00:32:07 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision:
Reduce warmup and measurement iterations for benchmark
test/micro/org/openjdk/bench/java/util/Base64EncodeToString.java line 9:
7: * published by the Free Software Foundation. Oracle designates this 8: * particular file as subject to the "Classpath" exception as provided 9: * by Oracle in the LICENSE file that accompanied this code.
Suggestion: * published by the Free Software Foundation. Note that tests do not have this exception. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/29920#discussion_r2861996941
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision: Update license for benchmark Co-authored-by: Chen Liang <liach@openjdk.org> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29920/files - new: https://git.openjdk.org/jdk/pull/29920/files/a2fa6d14..4e82915a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29920&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29920&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/29920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29920/head:pull/29920 PR: https://git.openjdk.org/jdk/pull/29920
On Fri, 27 Feb 2026 03:43:11 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision:
Update license for benchmark
Co-authored-by: Chen Liang <liach@openjdk.org>
Let's wait for at least another reviewer before integration. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29920#pullrequestreview-3864791469
On Fri, 27 Feb 2026 03:43:11 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision:
Update license for benchmark
Co-authored-by: Chen Liang <liach@openjdk.org>
Looks good. Thanks for the JMH test. ------------- Marked as reviewed by rriggs (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29920#pullrequestreview-3867870340
On Fri, 27 Feb 2026 03:43:11 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision:
Update license for benchmark
Co-authored-by: Chen Liang <liach@openjdk.org>
@kilink Your change (at version 4e82915aa112483ba5de7c357bba16ca44630551) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3973890660
On Fri, 27 Feb 2026 03:43:11 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision:
Update license for benchmark
Co-authored-by: Chen Liang <liach@openjdk.org>
I will wait till the end of Monday and sponsor if there is no objection coming up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3980100121
On Sun, 1 Mar 2026 14:33:21 GMT, Chen Liang <liach@openjdk.org> wrote:
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision:
Update license for benchmark
Co-authored-by: Chen Liang <liach@openjdk.org>
I will wait till the end of Monday and sponsor if there is no objection coming up.
@liach would you be able to sponsor this change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3999466455
On Fri, 27 Feb 2026 03:43:11 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
Patrick Strawderman has updated the pull request incrementally with one additional commit since the last revision:
Update license for benchmark
Co-authored-by: Chen Liang <liach@openjdk.org>
Sure. Ran the benchmark, problem and solution are both valid. We can roll back in the worst case if this proves problematic, even though I think it's unlikely. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29920#issuecomment-3999950418
On Wed, 25 Feb 2026 15:37:02 GMT, Patrick Strawderman <duke@openjdk.org> wrote:
Avoid a byte array copy in encodeToString by using JavaLangAccess#uncheckedNewStringWithLatin1Bytes
This pull request has now been integrated. Changeset: 08c8520b Author: Patrick Strawderman <patrick@kilink.net> Committer: Chen Liang <liach@openjdk.org> URL: https://git.openjdk.org/jdk/commit/08c8520b39083ec6354dc5df2f18c1f4c3588053 Stats: 60 lines in 2 files changed: 58 ins; 1 del; 1 mod 8378698: Optimize Base64.Encoder#encodeToString Reviewed-by: liach, rriggs ------------- PR: https://git.openjdk.org/jdk/pull/29920
participants (7)
-
Alan Bateman
-
Chen Liang
-
duke
-
Eirik Bjørsnøs
-
Patrick Strawderman
-
Roger Riggs
-
Viktor Klang