Small survey about JDK-8280101: in String.split grouped regex should keep the delimiter

Jim Laskey james.laskey at oracle.com
Fri Mar 31 13:25:34 UTC 2023


D’oh - I did not know that. Good idiom to have in your pocket.

> On Mar 31, 2023, at 10:20 AM, Raffaello Giulietti <raffaello.giulietti at oracle.com> wrote:
> 
> IIUC, Case 1 could be covered already today with split("\\n", -1). That returns N+1 substrings when the delimiter occurs N times.
> 
> The proposed overload in String would be
>    String[] split(String regex, int limit, boolean withDelimiters)
> with the existing split(regex, limit) simply invoking split(regex, limit, false)
> 
> 
> 
> On 2023-03-31 14:47, Jim Laskey wrote:
>> [IMHO] I think you’ll find that split isn’t everyone’s favourite method for lots of reasons. For example, more often than not I would like to have split guarantee N + 1 elements in the result, where N is the number delimiters found. Case 1: text blocks split on newlines. Case 2: string templates split on embedded expressions. The problem with split (as I see it) is that when the last delimiter is at the end of the string, then the result contains N elements. There are use cases where this is useful, just not my cases.
>> What you are describing is beyond split and best done by something else, more in the parsing domain, maybe even a new method “cleave”.
>> Cheers,
>> — Jim
>>> On Mar 31, 2023, at 9:18 AM, Raffaello Giulietti <raffaello.giulietti at oracle.com> wrote:
>>> 
>>> HI,
>>> 
>>> JBS issue JDK-8280101 [0] proposes to add functionality to String.split() to behave more like the perl equivalent. Rather than returning only the substrings resulting from the split, the perl implementation can return an alternation of the substrings and the matched delimiters when the delimiter pattern is grouped. Because of the non-negligible behavioral change this would imply in the JDK implementation and the impact on existing client code, it cannot be done as proposed by the issue reporter.
>>> 
>>> However, since implementing the requested behavior outside the JDK is rather tricky, it would make sense to add an overload of String.split() that returns the result described in the JBS issue, that is, an alternation of substrings and delimiters. As a consequence, a similar overload would be needed in java.util.regex.Pattern as well, where the bulk of the implementation underlying String.split() is located. Further, an overload of Pattern.splitStream() is probably needed as well. Note that both String and Pattern are final classes, so the overloads are safe to add.
>>> 
>>> As mentioned, the reason to add these overloads to the JDK is because it is somehow complicated to implement that behavior outside class Pattern. The implementation of the extensions in the JDK, on the contrary, looks rather simple. But before preparing a PR and a CSR, I'd like to gather more opinions.
>>> 
>>> WDYT?
>>> 
>>> 
>>> Greetings
>>> Raffaello
>>> 
>>> ----
>>> 
>>> [0] https://bugs.openjdk.org/browse/JDK-8280101



More information about the core-libs-dev mailing list