<i18n dev> RFR: 8364007: Add overload without arguments to codePointCount in String etc.
Tatsunori Uchino
duke at openjdk.org
Sat Jul 26 07:30:54 UTC 2025
On Thu, 24 Jul 2025 22:07:38 GMT, Mikhail Yankelevich <myankelevich at openjdk.org> wrote:
>> Adds `codePointCount()` overloads to `String`, `Character`, `(Abstract)StringBuilder`, and `StringBuffer` to make it possible to conveniently retrieve the length of a string as code points without extra boundary checks.
>>
>>
>> if (superTremendouslyLongExpressionYieldingAString().codePointCount() > limit) {
>> throw new Exception("exceeding length");
>> }
>>
>>
>> Is a CSR required to this change?
>
> src/java.base/share/classes/java/lang/Character.java line 9969:
>
>> 9967: int n = length;
>> 9968: for (int i = 0; i < length; ) {
>> 9969: if (isHighSurrogate(seq.charAt(i++)) && i < length &&
>
> Imo this is quite hard to read, especially with `i++` inside of the if statement. What do you think about changing it to this?
> ```java
> for (int i = 1; i < length-1; i++) {
> if (isHighSurrogate(seq.charAt(i)) &&
> isLowSurrogate(seq.charAt(i + 1))) {
> n--;
> i++;
> }
> }
> ```
>
> edit: fixed a typo in my example
In the first place it yields an _incorrect_ result for sequences whose first character is a supplementary character.
jshell> int len(CharSequence seq) {
...> final int length = seq.length();
...> int n = length;
...> for (int i = 1; i < length-1; i++) {
...> if (isHighSurrogate(seq.charAt(i)) &&
...> isLowSurrogate(seq.charAt(i + 1))) {
...> n--;
...> i++;
...> }
...> }
...> return n;
...> }
| 次を作成しました: メソッド len(CharSequence)。しかし、 method isHighSurrogate(char), and method isLowSurrogate(char)が宣言されるまで、起動できません
jshell> boolean isHighSurrogate(char ch) {
...> return 0xd800 <= ch && ch <= 0xdbff;
...> }
| 次を作成しました: メソッド isHighSurrogate(char)
jshell> boolean isLowSurrogate(char ch) {
...> return 0xdc00 <= ch && ch <= 0xdfff;
...> }
| 次を作成しました: メソッド isLowSurrogate(char)
jshell> len("𠮷");
$5 ==> 2
jshell> len("OK👍");
$6 ==> 3
jshell> len("👍👍");
$7 ==> 3
I will not change it alone unless the existing overload `int codePointCount(CharSequence seq, int beginIndex, int endIndex)` is also planned to be changed.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r2232751973
More information about the i18n-dev
mailing list