RFR: 8305486: Add split() variants that keep the delimiters to String and j.u.r.Pattern [v2]
Roger Riggs
rriggs at openjdk.org
Mon Apr 10 15:24:49 UTC 2023
On Fri, 7 Apr 2023 14:11:36 GMT, Raffaello Giulietti <rgiulietti at openjdk.org> wrote:
>> Add `split()` overloads to `String` and `java.util.regex.Pattern` that, in addition to the substrings returned by current `split()` variants, also return the delimiters matching the regular expression.
>
> Raffaello Giulietti has updated the pull request incrementally with one additional commit since the last revision:
>
> 8305486: Add split() variants that keep the delimiters to String and j.u.r.Pattern
> Restored original JavaDoc to existing methods.
> Replaced proposed parametrized split(*, boolean) methods with suggested splitWithDelimiters(*) variants.
> Adjusted tests accordingly.
src/java.base/share/classes/java/lang/String.java line 3237:
> 3235: * the array are in the order in which they occur in this string. If the
> 3236: * expression does not match any part of the input then the resulting array
> 3237: * has just one element, namely this string.
This paragraph duplicates part of the next paragraph.
src/java.base/share/classes/java/lang/String.java line 3253:
> 3251: * string then an empty leading substring is included at the beginning
> 3252: * of the resulting array. A zero-width match at the beginning however
> 3253: * never produces such empty leading substring.
Is there any case in which it is ambiguous whether the first element is a string or delimiter?
I'm not sure what patterns could be zero-width. But perhaps, a regex such as "^|\s", where one of the patterns could be zero-width and the other is non-zero width?
That could make the API hard to use.
src/java.base/share/classes/java/lang/String.java line 3302:
> 3300: * <tr><!-- o -->
> 3301: * <th scope="row" style="font-weight:normal; text-align:right; padding-right:1em">0</th>
> 3302: * <td>{@code { "b", "o", "", "o", ":::and::f", "o", "", "o" }}</td></tr>
These cases might be a bit easier to understand if the regex matched characters that looked more like delimiters, perhaps just ":".
src/java.base/share/classes/java/lang/String.java line 3324:
> 3322: * the result threshold, as described above
> 3323: *
> 3324: * @return the array of strings computed by splitting this string
Add a mention to the delimiters in the result.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/13305#discussion_r1161745116
PR Review Comment: https://git.openjdk.org/jdk/pull/13305#discussion_r1161756041
PR Review Comment: https://git.openjdk.org/jdk/pull/13305#discussion_r1161773124
PR Review Comment: https://git.openjdk.org/jdk/pull/13305#discussion_r1161790058
More information about the core-libs-dev
mailing list