I'd like add no-argument overloads to CharSequence, String, and StringBuilder (JDK-8364007)

Wed Aug 20 21:48:56 UTC 2025

HI,

This seems like a reasonable idea.
For CharSequence, I would add them as default methods on CharSequence
and include the API Character.codePointCount(csq, begin, end)).
The char array version will still need to be in Character.

Regards, Roger

On 8/11/25 7:37 PM, Uchino Tatsunori wrote:
> Dear Chen-san,
>
> The beginIndex there is just a mistake that must have been removed.
>
> String.codePointCount()
>
> is the correct suggestion, as you can imagine. I am sorry for the 
> confusion.
>
> Regards,
>
> Tatsunori Uchino
>
> 2025/08/12 7:29 Chen Liang <chen.l.liang at oracle.com>:
>
>     Hi Uchino, I think your request is sensible in general.
>
>     Do you intend to require a beginIndex for the codePointCount for
>     String? I think a no-arg version suffices.
>
>     Also forwarding this to i18n-dev as it is the locale-related list.
>
>     P.S. When you reply, make sure you click "Reply all" so all the
>     recipients of this current mail gets your reply. Otherwise, the
>     reply is only sent to me, and others on the list won't see your reply.
>
>     Regards, Chen
>     ------------------------------------------------------------------------
>     *From:* core-libs-dev <core-libs-dev-retn at openjdk.org> on behalf
>     of Uchino Tatsunori <tats.u at live.jp>
>     *Sent:* Monday, August 11, 2025 6:54 AM
>     *To:* core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
>     *Subject:* I'd like add no-argument overloads to CharSequence,
>     String, and StringBuilder (JDK-8364007)
>     Dear core-libs developers,
>
>     I'd like to add the following overloads:
>
>     • Character.codePointCount(CharSequence seq)
>     • Character.codePointCount(char[] a)
>     • String.codePointCount(int beginIndex)
>     • StringBuffer.codePointCount()
>     • StringBuilder.codePointCount()
>
>     and created a patch (https://github.com/openjdk/jdk/pull/26461).
>
>     Why:
>
>     There have already been similar overloads with the start and end
>     indicies by JSR 204 (JDK-4985217). They are thought to have been
>     designed with a priority on versatility. They make the
>     specification of indices mandatory, but have the following
>     disadvantages:
>
>     1. The string expression have to be written twice. Unlike C#, Java
>     has no equivalent of extended methods.
>     2. Unneccesary boundary checks are mixed in.
>     3. The most userland code tries to calculate the number of code
>     points in the entire stirng.
>     4. Some other languages can count the number of code points in a
>     single function without extra arguments (e.g. len() in Python3)
>
>     For 3., e.g.:
>
>     • VARCHAR in MySQL & PostgreSQL counts the number of characters in
>     the unit of code points. e.g. VARCHAR(20) means that the limit is
>     20 code points, not 20 UTF-16 code units (20 chars in Java)
>     • NIST Special Publication 800-63B stiplates that the password
>     length must be counted as the unit of code points. (Quote from
>     https://pages.nist.gov/800-63-3/sp800-63b.html#-5112-memorized-secret-verifiers
>     : "For purposes of the above length requirements, each Unicode
>     code point SHALL be counted as a single character.")
>
>     I would like to get agreement on these changes and would like to
>     know what I have to do outside of GitHub (e.g how to submit CSRs).
>     If you have a GitHub account, it would be helpful if you could
>     reply to the PR. If not, you can reply directly to this email.
>
>     Best Regards,
>
>     Tatsunori Uchino
>     https://github.com/tats-u/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250820/1e5e5311/attachment.htm>