I'd like add no-argument overloads to CharSequence, String, and StringBuilder (JDK-8364007)
Roger Riggs
roger.riggs at oracle.com
Wed Aug 20 21:48:56 UTC 2025
HI,
This seems like a reasonable idea.
For CharSequence, I would add them as default methods on CharSequence
and include the API Character.codePointCount(csq, begin, end)).
The char array version will still need to be in Character.
Regards, Roger
On 8/11/25 7:37 PM, Uchino Tatsunori wrote:
> Dear Chen-san,
>
> The beginIndex there is just a mistake that must have been removed.
>
> String.codePointCount()
>
> is the correct suggestion, as you can imagine. I am sorry for the
> confusion.
>
> Regards,
>
> Tatsunori Uchino
>
> 2025/08/12 7:29 Chen Liang <chen.l.liang at oracle.com>:
>
> Hi Uchino, I think your request is sensible in general.
>
> Do you intend to require a beginIndex for the codePointCount for
> String? I think a no-arg version suffices.
>
> Also forwarding this to i18n-dev as it is the locale-related list.
>
> P.S. When you reply, make sure you click "Reply all" so all the
> recipients of this current mail gets your reply. Otherwise, the
> reply is only sent to me, and others on the list won't see your reply.
>
> Regards, Chen
> ------------------------------------------------------------------------
> *From:* core-libs-dev <core-libs-dev-retn at openjdk.org> on behalf
> of Uchino Tatsunori <tats.u at live.jp>
> *Sent:* Monday, August 11, 2025 6:54 AM
> *To:* core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
> *Subject:* I'd like add no-argument overloads to CharSequence,
> String, and StringBuilder (JDK-8364007)
> Dear core-libs developers,
>
> I'd like to add the following overloads:
>
> • Character.codePointCount(CharSequence seq)
> • Character.codePointCount(char[] a)
> • String.codePointCount(int beginIndex)
> • StringBuffer.codePointCount()
> • StringBuilder.codePointCount()
>
> and created a patch (https://github.com/openjdk/jdk/pull/26461).
>
> Why:
>
> There have already been similar overloads with the start and end
> indicies by JSR 204 (JDK-4985217). They are thought to have been
> designed with a priority on versatility. They make the
> specification of indices mandatory, but have the following
> disadvantages:
>
> 1. The string expression have to be written twice. Unlike C#, Java
> has no equivalent of extended methods.
> 2. Unneccesary boundary checks are mixed in.
> 3. The most userland code tries to calculate the number of code
> points in the entire stirng.
> 4. Some other languages can count the number of code points in a
> single function without extra arguments (e.g. len() in Python3)
>
> For 3., e.g.:
>
> • VARCHAR in MySQL & PostgreSQL counts the number of characters in
> the unit of code points. e.g. VARCHAR(20) means that the limit is
> 20 code points, not 20 UTF-16 code units (20 chars in Java)
> • NIST Special Publication 800-63B stiplates that the password
> length must be counted as the unit of code points. (Quote from
> https://pages.nist.gov/800-63-3/sp800-63b.html#-5112-memorized-secret-verifiers
> : "For purposes of the above length requirements, each Unicode
> code point SHALL be counted as a single character.")
>
> I would like to get agreement on these changes and would like to
> know what I have to do outside of GitHub (e.g how to submit CSRs).
> If you have a GitHub account, it would be helpful if you could
> reply to the PR. If not, you can reply directly to this email.
>
> Best Regards,
>
> Tatsunori Uchino
> https://github.com/tats-u/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250820/1e5e5311/attachment.htm>
More information about the core-libs-dev
mailing list