RFC: 8356679: Using CharSequence::getChars internally
Markus KARG
markus at headcrashing.eu
Sun May 18 15:09:43 UTC 2025
Roger,
thank you for your comments.
Following your advice I have splitted the larger work of JDK-8356679
into sub tasks.
I would like to start with a first PR implementing the *foundational*
work, i. e. optimizing Writer::append for efficiency (JDK-8357183). For
convenience, attached below is a copy of the description.
Comments Welcome!
If everybody is fine with this, I would be happy to get some +1 to
publish the PR for JDK-8357183!
Am 14.05.2025 um 15:57 schrieb Roger Riggs:
> Hi Markus,
>
> Starting out with the common case is a good idea for the first PR.
>
> I much prefer a PR with a single goal and that comes to a conclusion
> and does not add new features or changes after the PR is submitted.
> I tend to lose interest in PRs with lots of churn, it means I have to
> re-review the bulk of it when there is a change and may wait days to
> let it settle down before coming back to it.
>
> I tend to think the PR was not really ready to be reviewed if simple
> issues and corrections have to be made frequently.
> Do your own checking for typos and copyrights and simple refactoring
> before opening the PR.
> Quality before quantity or speed.
>
> I'm fine with separate Jira issues that clearly delineate a specific
> scope and goal.
>
> The title of this issue (8356679) doesn't identify the real goal.
> It seems to be to improve performance or memory usage, not just to use
> a new API.
>
> These are my personal opinions about contributions and process.
>
> Regards, Roger
>
>
> On 5/14/25 6:48 AM, Markus KARG wrote:
>> Many of the modified classes derive from a common super class and
>> share one needed common change (which is one of the points which are
>> easy to see once you see all of those classes in a single PR, but
>> hard to explain in plaint-text pre-PR mailing list threads), so at
>> least those need to be discussed *together*. But to spare JBS and
>> PRs, I can open the PR with just the first set of changes, and once
>> we agree that this set is fine, I can push the next commit *in the
>> same PR*. Otherwise we would need endless JBS, mailing list threads,
>> and PRs, just to fixe a dozen internal code lines.
>>
>> Having said that, does the current state of this thread count as
>> "reached common agreement to file a PR" or do I still have to wait
>> until more people chime in?
>>
>> -Markus
>>
>>
>> Am 13.05.2025 um 15:10 schrieb Roger Riggs:
>>> Hi Markus,
>>>
>>> A main point was to avoid trying to do everything at once.
>>> The PR comments become hard to follow and intermingled and it takes
>>> longer to get agreement because of the thrash in the PR.
>>>
>>> Roger
>>>
>>> On 5/13/25 5:05 AM, Markus KARG wrote:
>>>> Thank you, Roger.
>>>>
>>>> Actually the method helps in the "toString()" variants, too, as in
>>>> some places we could *get rid* of "toString()" (which is more work
>>>> than "just" a buffer due to the added compression complexity).
>>>>
>>>> In fact, I already took the time to rewrite *all* of them while
>>>> waiting for the approval of this list posting. In *all* cases
>>>> *less* buffering / copying is needed, and *less* "toString()"
>>>> conversion (which is a copy under the hood) is needed. So if I
>>>> would be allowed to show the code as a PR, it would be much easier
>>>> to explain and discuss.
>>>>
>>>> A PR is the best place to discuss "how to code would change". In
>>>> the worst case, let's drop it if we see that it is actually a bad
>>>> thing.
>>>>
>>>> -Markus
>>>>
>>>>
>>>> Am 12.05.2025 um 20:18 schrieb Roger Riggs:
>>>>> Hi Markus,
>>>>>
>>>>> On the surface, its looks constructive.
>>>>> I suspect that many of these cases will turn into discussions
>>>>> about the right/best/better way to buffer the characters.
>>>>> The getChars method only helps when extracting to a char array,
>>>>> many of the current implementations create strings as the
>>>>> intermediary. The advantage of the 1 character at a time technique
>>>>> is not needing a (separated allocated) buffer.
>>>>> Consider taking a few at a time before launching into the whole set.
>>>>>
>>>>> $.02, Roger
>>>>>
>>>>> On 5/11/25 2:45 AM, Markus KARG wrote:
>>>>>> Dear Core Libs Team,
>>>>>>
>>>>>> I am hereby requesting comments on JDK-8356679.
>>>>>>
>>>>>> I would like to invest some time and set up a PR implementing
>>>>>> Chen Liangs's proposal laid out in
>>>>>> https://bugs.openjdk.org/browse/JDK-8356679. For your
>>>>>> convenience, the text of that JBS is copied below. According to
>>>>>> the Developer's Guide I do need to get broad agreement BEFORE
>>>>>> filing a PR. Therefore, I kindly ask everybody to briefly show
>>>>>> consent, so I may file a PR.
>>>>>>
>>>>>> Thanks
>>>>>> -Markus
>>>>>>
>>>>>>
>>>>>> Copy from https://bugs.openjdk.org/browse/JDK-8356679:
>>>>>>
>>>>>> Recently OpenJDK adopted the new method
>>>>>> CharSequence::getChars(int, int, char[], int) for inclusion in
>>>>>> Java 25. As a bulk reader method, it allows potentially improved
>>>>>> efficiency over the previously available char-by-char reader
>>>>>> method CharSequence::charAt(int).
>>>>>>
>>>>>> Chen Liang suggested on March 23rd on the core-lib-dev mailing
>>>>>> list to use the new method within the internal source code of
>>>>>> OpenJDK for the implementation of Appendables (see
>>>>>> https://mail.openjdk.org/pipermail/core-libs-dev/2025-March/141521.html).
>>>>>> The idea behind this is that the implementations might be more
>>>>>> efficient then.
>>>>>>
>>>>>> A quick analysis of the OpenJDK source code identified (at least)
>>>>>> the following classes which could potentially run more efficient
>>>>>> when using CharSequence::getChars internally, thanks to bulk
>>>>>> reading and / or prevention of internal copies / toString()
>>>>>> conversions:
>>>>>> * java.io.Writer
>>>>>> * java.io.StringWriter
>>>>>> * java.io.PrintWriter
>>>>>> * java.io.BufferedWriter
>>>>>> * java.io.CharArrayWriter
>>>>>> * java.io.FileWriter
>>>>>> * java.io.OutputStreamWriter
>>>>>> * sun.nio.cs.StreamEncoder
>>>>>> * java.io.PrintStream
>>>>>> * java.nio.CharBuffer
>>>>>>
>>>>>> In the sense of "eat your own dog food", it makes sense to
>>>>>> implement Chen's idea in (at least) those classes. Possibly more
>>>>>> classes could get identified when taking a deeper look. Besides
>>>>>> the potential efficiency improvements, it would be a good show
>>>>>> case for the usage of the new API.
>>>>>>
>>>>>> The risk of this change should be low, as test coverage exists,
>>>>>> and as the intended changes are solely internal to the
>>>>>> implementation. No API will get changed. In some cases the
>>>>>> JavaDocs will get slightly adapted where it currently exposes the
>>>>>> actual implementation (to not lie in future).
>>>>>>
>>>>>
>>>
>
More information about the core-libs-dev
mailing list