<i18n dev> I'd like add no-argument overloads to CharSequence, String, and StringBuilder (JDK-8364007)

Mon Aug 11 22:29:46 UTC 2025

Hi Uchino, I think your request is sensible in general.

Do you intend to require a beginIndex for the codePointCount for String? I think a no-arg version suffices.

Also forwarding this to i18n-dev as it is the locale-related list.

P.S. When you reply, make sure you click "Reply all" so all the recipients of this current mail gets your reply. Otherwise, the reply is only sent to me, and others on the list won't see your reply.

Regards, Chen
________________________________
From: core-libs-dev <core-libs-dev-retn at openjdk.org> on behalf of Uchino Tatsunori <tats.u at live.jp>
Sent: Monday, August 11, 2025 6:54 AM
To: core-libs-dev at openjdk.org <core-libs-dev at openjdk.org>
Subject: I'd like add no-argument overloads to CharSequence, String, and StringBuilder (JDK-8364007)

Dear core-libs developers,

I'd like to add the following overloads:

• Character.codePointCount(CharSequence seq)
• Character.codePointCount(char[] a)
• String.codePointCount(int beginIndex)
• StringBuffer.codePointCount()
• StringBuilder.codePointCount()

and created a patch (https://github.com/openjdk/jdk/pull/26461).

Why:

There have already been similar overloads with the start and end indicies by JSR 204 (JDK-4985217). They are thought to have been designed with a priority on versatility. They make the specification of indices mandatory, but have the following disadvantages:

1. The string expression have to be written twice. Unlike C#, Java has no equivalent of extended methods.
2. Unneccesary boundary checks are mixed in.
3. The most userland code tries to calculate the number of code points in the entire stirng.
4. Some other languages can count the number of code points in a single function without extra arguments (e.g. len() in Python3)

For 3., e.g.:

• VARCHAR in MySQL & PostgreSQL counts the number of characters in the unit of code points. e.g. VARCHAR(20) means that the limit is 20 code points, not 20 UTF-16 code units (20 chars in Java)
• NIST Special Publication 800-63B stiplates that the password length must be counted as the unit of code points. (Quote from https://pages.nist.gov/800-63-3/sp800-63b.html#-5112-memorized-secret-verifiers : "For purposes of the above length requirements, each Unicode code point SHALL be counted as a single character.")

I would like to get agreement on these changes and would like to know what I have to do outside of GitHub (e.g how to submit CSRs). If you have a GitHub account, it would be helpful if you could reply to the PR. If not, you can reply directly to this email.

Best Regards,

Tatsunori Uchino
https://github.com/tats-u/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/i18n-dev/attachments/20250811/dbac2adf/attachment-0001.htm>