Small survey about JDK-8280101: in String.split grouped regex should keep the delimiter

Raffaello Giulietti raffaello.giulietti at oracle.com
Fri Mar 31 12:18:06 UTC 2023


HI,

JBS issue JDK-8280101 [0] proposes to add functionality to 
String.split() to behave more like the perl equivalent. Rather than 
returning only the substrings resulting from the split, the perl 
implementation can return an alternation of the substrings and the 
matched delimiters when the delimiter pattern is grouped. Because of the 
non-negligible behavioral change this would imply in the JDK 
implementation and the impact on existing client code, it cannot be done 
as proposed by the issue reporter.

However, since implementing the requested behavior outside the JDK is 
rather tricky, it would make sense to add an overload of String.split() 
that returns the result described in the JBS issue, that is, an 
alternation of substrings and delimiters. As a consequence, a similar 
overload would be needed in java.util.regex.Pattern as well, where the 
bulk of the implementation underlying String.split() is located. 
Further, an overload of Pattern.splitStream() is probably needed as 
well. Note that both String and Pattern are final classes, so the 
overloads are safe to add.

As mentioned, the reason to add these overloads to the JDK is because it 
is somehow complicated to implement that behavior outside class Pattern. 
The implementation of the extensions in the JDK, on the contrary, looks 
rather simple. But before preparing a PR and a CSR, I'd like to gather 
more opinions.

WDYT?


Greetings
Raffaello

----

[0] https://bugs.openjdk.org/browse/JDK-8280101


More information about the core-libs-dev mailing list