RFC: 8357183: Improving efficiency of Writer::append(CharSequence) and Writer::append(CharSequence, int, int) / Sub Task of 8356679: Using CharSequence::getChars internally

Sat May 24 14:37:15 UTC 2025

Chen,

thank you for your approval for a DRAFT PR.

Following your proposal, I have published a DRAFT PR at 
https://github.com/openjdk/jdk/pull/25432 so everybody can see what the 
actual intended code change looks like, and better focus on that actual 
code change and its risks and benefits. The intention of that DRAFT PR 
explicitly is NOT to request code reviews, but solely to serve as a 
publicly visible piece of code to support the ongoing discussion here on 
this mailing list.

TARGET VERSION: 26

As we are near to JDK 25 GA, the target is inclusion into JDK 26 or 
later, so no need to rush. Nevertheless, I would be happy if everybody 
comments ASAP (at least briefly) to keep this discussion alive.

PRELIMINARY RISK ANALYSIS: LOW

Given the changes at hand now, I have performed a brief corpus analysis 
using Github Copilot (using the public Code Repositories of OpenJDK 
itself, but also the repositories of Apache Foundation and Eclipse 
Foundation) to see if the proposed change implies REAL risk due to its 
change of behavior. The result was that Github Copilot was UNABLE to 
find at least one single Java class that would fail after the proposed 
change of this draft PR. While many classes extend Writer directly or 
indirectly, only few have (nor their super- nor subclasses) an 
implementation of the now skipped methods write(String) or write(String, 
int, int). Those that do have (nor their super- nor subclasses) are NOT 
receiving append(CharSequence) or append(CharSequence, int, int) calls, 
or they (or their super- or subclasses) implement write(String) or 
write(String, int, int) as an alias to append(CharSequence) or 
append(CharSequence, int, int) - hence they already explicitly do what 
the DRAFT PR proposes to do implicitly in future. So at least for this 
brief analysis, the ACTUAL risk to break existing code seems to be 
rather small actually (in contrast to the ASSUMED risk looking at 
"append" being there since JDK 5). NB: Having said that, one such class 
WAS actually found by Github Copilot, but I explicitly removed it from 
the search result, as I already knew it is still working correctly, as 
it is an internal "sun" class of OpenJDK which I have already visited 
manually in preparation of this PR.

NEXT STEPS

* Please anybody take note of the code change found in the DRAFT PR at 
https://github.com/openjdk/jdk/pull/25432 in the next few weeks to see 
if anything prevents us from turning this PR from the DRAFT state into 
the RFR state.

* Unless this discussion points us towards dropping the DRAFT PR, in the 
next weeks eventually I will open a CSR for approval of the behavior 
change, basing on the result of the attended brief corpus analysis.

Thanks everybody for sharing your comments!

-Markus

Am 18.05.2025 um 22:50 schrieb Chen Liang:
> Hi Markus,
> I think you can submit a draft PR - while some people here like Alan 
> Bateman use draft PRs to indicate "not ready for any review" and 
> discussions are not forwarded to the mailing lists, they are still 
> very useful for looking at the change set.
>
> Note that we are a few short weeks from RDP1 of 25 - while I also wish 
> improvements can be integrated into the JDK, there is no guarantee 
> that they will make 25, and we shouldn't rush changes in order to 
> catch up with a release train.
>
> Regards, Chen
> ------------------------------------------------------------------------
> *From:* core-libs-dev <core-libs-dev-retn at openjdk.org> on behalf of 
> Markus KARG <markus at headcrashing.eu>
> *Sent:* Sunday, May 18, 2025 10:09 AM
> *To:* core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
> *Subject:* Re: RFC: 8356679: Using CharSequence::getChars internally
> Roger,
>
> thank you for your comments.
>
> Following your advice I have splitted the larger work of JDK-8356679
> into sub tasks.
>
> I would like to start with a first PR implementing the *foundational*
> work, i. e. optimizing Writer::append for efficiency (JDK-8357183). For
> convenience, attached below is a copy of the description.
>
> Comments Welcome!
>
> If everybody is fine with this, I would be happy to get some +1 to
> publish the PR for JDK-8357183!
>
> Am 14.05.2025 um 15:57 schrieb Roger Riggs:
> > Hi Markus,
> >
> > Starting out with the common case is a good idea for the first PR.
> >
> > I much prefer a PR with a single goal and that comes to a conclusion
> > and does not add new features or changes after the PR is submitted.
> > I tend to lose interest in PRs with lots of churn, it means I have to
> > re-review the bulk of it when there is a change and may wait days to
> > let it settle down before coming back to it.
> >
> > I tend to think the PR was not really ready to be reviewed if simple
> > issues and corrections have to be made frequently.
> > Do your own checking for typos and copyrights and simple refactoring
> > before opening the PR.
> > Quality before quantity or speed.
> >
> > I'm fine with separate Jira issues that clearly delineate a specific
> > scope and goal.
> >
> > The title of this issue (8356679) doesn't identify the real goal.
> > It seems to be to improve performance or memory usage, not just to use
> > a new API.
> >
> > These are my personal opinions about contributions and process.
> >
> > Regards, Roger
> >
> >
> > On 5/14/25 6:48 AM, Markus KARG wrote:
> >> Many of the modified classes derive from a common super class and
> >> share one needed common change (which is one of the points which are
> >> easy to see once you see all of those classes in a single PR, but
> >> hard to explain in plaint-text pre-PR mailing list threads), so at
> >> least those need to be discussed *together*. But to spare JBS and
> >> PRs, I can open the PR with just the first set of changes, and once
> >> we agree that this set is fine, I can push the next commit *in the
> >> same PR*. Otherwise we would need endless JBS, mailing list threads,
> >> and PRs, just to fixe a dozen internal code lines.
> >>
> >> Having said that, does the current state of this thread count as
> >> "reached common agreement to file a PR" or do I still have to wait
> >> until more people chime in?
> >>
> >> -Markus
> >>
> >>
> >> Am 13.05.2025 um 15:10 schrieb Roger Riggs:
> >>> Hi Markus,
> >>>
> >>> A main point was to avoid trying to do everything at once.
> >>> The PR comments become hard to follow and intermingled and it takes
> >>> longer to get agreement because of the thrash in the PR.
> >>>
> >>> Roger
> >>>
> >>> On 5/13/25 5:05 AM, Markus KARG wrote:
> >>>> Thank you, Roger.
> >>>>
> >>>> Actually the method helps in the "toString()" variants, too, as in
> >>>> some places we could *get rid* of "toString()" (which is more work
> >>>> than "just" a buffer due to the added compression complexity).
> >>>>
> >>>> In fact, I already took the time to rewrite *all* of them while
> >>>> waiting for the approval of this list posting. In *all* cases
> >>>> *less* buffering / copying is needed, and *less* "toString()"
> >>>> conversion (which is a copy under the hood) is needed. So if I
> >>>> would be allowed to show the code as a PR, it would be much easier
> >>>> to explain and discuss.
> >>>>
> >>>> A PR is the best place to discuss "how to code would change". In
> >>>> the worst case, let's drop it if we see that it is actually a bad
> >>>> thing.
> >>>>
> >>>> -Markus
> >>>>
> >>>>
> >>>> Am 12.05.2025 um 20:18 schrieb Roger Riggs:
> >>>>> Hi Markus,
> >>>>>
> >>>>> On the surface, its looks constructive.
> >>>>> I suspect that many of these cases will turn into discussions
> >>>>> about the right/best/better way to buffer the characters.
> >>>>> The getChars method only helps when extracting to a char array,
> >>>>> many of the current implementations create strings as the
> >>>>> intermediary. The advantage of the 1 character at a time technique
> >>>>> is not needing a (separated allocated) buffer.
> >>>>> Consider taking a few at a time before launching into the whole set.
> >>>>>
> >>>>> $.02, Roger
> >>>>>
> >>>>> On 5/11/25 2:45 AM, Markus KARG wrote:
> >>>>>> Dear Core Libs Team,
> >>>>>>
> >>>>>> I am hereby requesting comments on JDK-8356679.
> >>>>>>
> >>>>>> I would like to invest some time and set up a PR implementing
> >>>>>> Chen Liangs's proposal laid out in
> >>>>>> https://bugs.openjdk.org/browse/JDK-8356679. For your
> >>>>>> convenience, the text of that JBS is copied below. According to
> >>>>>> the Developer's Guide I do need to get broad agreement BEFORE
> >>>>>> filing a PR. Therefore, I kindly ask everybody to briefly show
> >>>>>> consent, so I may file a PR.
> >>>>>>
> >>>>>> Thanks
> >>>>>> -Markus
> >>>>>>
> >>>>>>
> >>>>>> Copy from https://bugs.openjdk.org/browse/JDK-8356679:
> >>>>>>
> >>>>>> Recently OpenJDK adopted the new method
> >>>>>> CharSequence::getChars(int, int, char[], int) for inclusion in
> >>>>>> Java 25. As a bulk reader method, it allows potentially improved
> >>>>>> efficiency over the previously available char-by-char reader
> >>>>>> method CharSequence::charAt(int).
> >>>>>>
> >>>>>> Chen Liang suggested on March 23rd on the core-lib-dev mailing
> >>>>>> list to use the new method within the internal source code of
> >>>>>> OpenJDK for the implementation of Appendables (see
> >>>>>> 
> https://mail.openjdk.org/pipermail/core-libs-dev/2025-March/141521.html).
> >>>>>> The idea behind this is that the implementations might be more
> >>>>>> efficient then.
> >>>>>>
> >>>>>> A quick analysis of the OpenJDK source code identified (at least)
> >>>>>> the following classes which could potentially run more efficient
> >>>>>> when using CharSequence::getChars internally, thanks to bulk
> >>>>>> reading and / or prevention of internal copies / toString()
> >>>>>> conversions:
> >>>>>> * java.io.Writer
> >>>>>> * java.io.StringWriter
> >>>>>> * java.io.PrintWriter
> >>>>>> * java.io.BufferedWriter
> >>>>>> * java.io.CharArrayWriter
> >>>>>> * java.io.FileWriter
> >>>>>> * java.io.OutputStreamWriter
> >>>>>> * sun.nio.cs.StreamEncoder
> >>>>>> * java.io.PrintStream
> >>>>>> * java.nio.CharBuffer
> >>>>>>
> >>>>>> In the sense of "eat your own dog food", it makes sense to
> >>>>>> implement Chen's idea in (at least) those classes. Possibly more
> >>>>>> classes could get identified when taking a deeper look. Besides
> >>>>>> the potential efficiency improvements, it would be a good show
> >>>>>> case for the usage of the new API.
> >>>>>>
> >>>>>> The risk of this change should be low, as test coverage exists,
> >>>>>> and as the intended changes are solely internal to the
> >>>>>> implementation. No API will get changed. In some cases the
> >>>>>> JavaDocs will get slightly adapted where it currently exposes the
> >>>>>> actual implementation (to not lie in future).
> >>>>>>
> >>>>>
> >>>
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250524/400a15a6/attachment-0001.htm>