RFC: 8356679: Using CharSequence::getChars internally

Sun May 18 15:12:05 UTC 2025

Sorry, the posting below was cross-posted by accident.

-Markus

Am 18.05.2025 um 17:09 schrieb Markus KARG:
> Roger,
>
> thank you for your comments.
>
> Following your advice I have splitted the larger work of JDK-8356679 
> into sub tasks.
>
> I would like to start with a first PR implementing the *foundational* 
> work, i. e. optimizing Writer::append for efficiency (JDK-8357183). 
> For convenience, attached below is a copy of the description.
>
> Comments Welcome!
>
> If everybody is fine with this, I would be happy to get some +1 to 
> publish the PR for JDK-8357183!
>
> Am 14.05.2025 um 15:57 schrieb Roger Riggs:
>> Hi Markus,
>>
>> Starting out with the common case is a good idea for the first PR.
>>
>> I much prefer a PR with a single goal and that comes to a conclusion 
>> and does not add new features or changes after the PR is submitted.
>> I tend to lose interest in PRs with lots of churn, it means I have to 
>> re-review the bulk of it when there is a change and may wait days to 
>> let it settle down before coming back to it.
>>
>> I tend to think the PR was not really ready to be reviewed if simple 
>> issues and corrections have to be made frequently.
>> Do your own checking for typos and copyrights and simple refactoring 
>> before opening the PR.
>> Quality before quantity or speed.
>>
>> I'm fine with separate Jira issues that clearly delineate a specific 
>> scope and goal.
>>
>> The title of this issue (8356679) doesn't identify the real goal.
>> It seems to be to improve performance or memory usage, not just to 
>> use a new API.
>>
>> These are my personal opinions about contributions and process.
>>
>> Regards, Roger
>>
>>
>> On 5/14/25 6:48 AM, Markus KARG wrote:
>>> Many of the modified classes derive from a common super class and 
>>> share one needed common change (which is one of the points which are 
>>> easy to see once you see all of those classes in a single PR, but 
>>> hard to explain in plaint-text pre-PR mailing list threads), so at 
>>> least those need to be discussed *together*. But to spare JBS and 
>>> PRs, I can open the PR with just the first set of changes, and once 
>>> we agree that this set is fine, I can push the next commit *in the 
>>> same PR*. Otherwise we would need endless JBS, mailing list threads, 
>>> and PRs, just to fixe a dozen internal code lines.
>>>
>>> Having said that, does the current state of this thread count as 
>>> "reached common agreement to file a PR" or do I still have to wait 
>>> until more people chime in?
>>>
>>> -Markus
>>>
>>>
>>> Am 13.05.2025 um 15:10 schrieb Roger Riggs:
>>>> Hi Markus,
>>>>
>>>> A main point was to avoid trying to do everything at once.
>>>> The PR comments become hard to follow and intermingled and it takes 
>>>> longer to get agreement because of the thrash in the PR.
>>>>
>>>> Roger
>>>>
>>>> On 5/13/25 5:05 AM, Markus KARG wrote:
>>>>> Thank you, Roger.
>>>>>
>>>>> Actually the method helps in the "toString()" variants, too, as in 
>>>>> some places we could *get rid* of "toString()" (which is more work 
>>>>> than "just" a buffer due to the added compression complexity).
>>>>>
>>>>> In fact, I already took the time to rewrite *all* of them while 
>>>>> waiting for the approval of this list posting. In *all* cases 
>>>>> *less* buffering / copying is needed, and *less* "toString()" 
>>>>> conversion (which is a copy under the hood) is needed. So if I 
>>>>> would be allowed to show the code as a PR, it would be much easier 
>>>>> to explain and discuss.
>>>>>
>>>>> A PR is the best place to discuss "how to code would change". In 
>>>>> the worst case, let's drop it if we see that it is actually a bad 
>>>>> thing.
>>>>>
>>>>> -Markus
>>>>>
>>>>>
>>>>> Am 12.05.2025 um 20:18 schrieb Roger Riggs:
>>>>>> Hi Markus,
>>>>>>
>>>>>> On the surface, its looks constructive.
>>>>>> I suspect that many of these cases will turn into discussions 
>>>>>> about the right/best/better way to buffer the characters.
>>>>>> The getChars method only helps when extracting to a char array, 
>>>>>> many of the current implementations create strings as the 
>>>>>> intermediary. The advantage of the 1 character at a time 
>>>>>> technique is not needing a (separated allocated) buffer.
>>>>>> Consider taking a few at a time before launching into the whole set.
>>>>>>
>>>>>> $.02, Roger
>>>>>>
>>>>>> On 5/11/25 2:45 AM, Markus KARG wrote:
>>>>>>> Dear Core Libs Team,
>>>>>>>
>>>>>>> I am hereby requesting comments on JDK-8356679.
>>>>>>>
>>>>>>> I would like to invest some time and set up a PR implementing 
>>>>>>> Chen Liangs's proposal laid out in 
>>>>>>> https://bugs.openjdk.org/browse/JDK-8356679. For your 
>>>>>>> convenience, the text of that JBS is copied below. According to 
>>>>>>> the Developer's Guide I do need to get broad agreement BEFORE 
>>>>>>> filing a PR. Therefore, I kindly ask everybody to briefly show 
>>>>>>> consent, so I may file a PR.
>>>>>>>
>>>>>>> Thanks
>>>>>>> -Markus
>>>>>>>
>>>>>>>
>>>>>>> Copy from https://bugs.openjdk.org/browse/JDK-8356679:
>>>>>>>
>>>>>>> Recently OpenJDK adopted the new method 
>>>>>>> CharSequence::getChars(int, int, char[], int) for inclusion in 
>>>>>>> Java 25. As a bulk reader method, it allows potentially improved 
>>>>>>> efficiency over the previously available char-by-char reader 
>>>>>>> method CharSequence::charAt(int).
>>>>>>>
>>>>>>> Chen Liang suggested on March 23rd on the core-lib-dev mailing 
>>>>>>> list to use the new method within the internal source code of 
>>>>>>> OpenJDK for the implementation of Appendables (see 
>>>>>>> https://mail.openjdk.org/pipermail/core-libs-dev/2025-March/141521.html). 
>>>>>>> The idea behind this is that the implementations might be more 
>>>>>>> efficient then.
>>>>>>>
>>>>>>> A quick analysis of the OpenJDK source code identified (at 
>>>>>>> least) the following classes which could potentially run more 
>>>>>>> efficient when using CharSequence::getChars internally, thanks 
>>>>>>> to bulk reading and / or prevention of internal copies / 
>>>>>>> toString() conversions:
>>>>>>> * java.io.Writer
>>>>>>> * java.io.StringWriter
>>>>>>> * java.io.PrintWriter
>>>>>>> * java.io.BufferedWriter
>>>>>>> * java.io.CharArrayWriter
>>>>>>> * java.io.FileWriter
>>>>>>> * java.io.OutputStreamWriter
>>>>>>> * sun.nio.cs.StreamEncoder
>>>>>>> * java.io.PrintStream
>>>>>>> * java.nio.CharBuffer
>>>>>>>
>>>>>>> In the sense of "eat your own dog food", it makes sense to 
>>>>>>> implement Chen's idea in (at least) those classes. Possibly more 
>>>>>>> classes could get identified when taking a deeper look. Besides 
>>>>>>> the potential efficiency improvements, it would be a good show 
>>>>>>> case for the usage of the new API.
>>>>>>>
>>>>>>> The risk of this change should be low, as test coverage exists, 
>>>>>>> and as the intended changes are solely internal to the 
>>>>>>> implementation. No API will get changed. In some cases the 
>>>>>>> JavaDocs will get slightly adapted where it currently exposes 
>>>>>>> the actual implementation (to not lie in future).
>>>>>>>
>>>>>>
>>>>
>>