Small survey about JDK-8280101: in String.split grouped regex should keep the delimiter

Roger Riggs roger.riggs at oracle.com
Fri Mar 31 13:53:37 UTC 2023


Hi Raffaello,

It sounds useful to return the delimiters in a new API.
It might be interesting to guarantee the array returns n strings and n-1 
delimiters; filling with an empty string at the beginning and end if the 
input starts with or ends with a delimiter.
Similar to the construction of TemplatedStrings (JDP 430) that has a 
predictable number of strings (n+1) and expression values (n).

An overload of Pattern.splitStream() that alternates would be hard to 
use since in the stream, it would be hard to distinguish between the 
delimiters and the strings. Someday, there might be support in streams 
for grouping.

Regards, Roger




On 3/31/23 8:18 AM, Raffaello Giulietti wrote:
> HI,
>
> JBS issue JDK-8280101 [0] proposes to add functionality to 
> String.split() to behave more like the perl equivalent. Rather than 
> returning only the substrings resulting from the split, the perl 
> implementation can return an alternation of the substrings and the 
> matched delimiters when the delimiter pattern is grouped. Because of 
> the non-negligible behavioral change this would imply in the JDK 
> implementation and the impact on existing client code, it cannot be 
> done as proposed by the issue reporter.
>
> However, since implementing the requested behavior outside the JDK is 
> rather tricky, it would make sense to add an overload of 
> String.split() that returns the result described in the JBS issue, 
> that is, an alternation of substrings and delimiters. As a 
> consequence, a similar overload would be needed in 
> java.util.regex.Pattern as well, where the bulk of the implementation 
> underlying String.split() is located. Further, an overload of 
> Pattern.splitStream() is probably needed as well. Note that both 
> String and Pattern are final classes, so the overloads are safe to add.
>
> As mentioned, the reason to add these overloads to the JDK is because 
> it is somehow complicated to implement that behavior outside class 
> Pattern. The implementation of the extensions in the JDK, on the 
> contrary, looks rather simple. But before preparing a PR and a CSR, 
> I'd like to gather more opinions.
>
> WDYT?
>
>
> Greetings
> Raffaello
>
> ----
>
> [0] https://bugs.openjdk.org/browse/JDK-8280101



More information about the core-libs-dev mailing list