RFR: JDK-8027645: Pattern.split() with positive lookahead
Xueming Shen
xueming.shen at oracle.com
Thu Nov 7 18:59:36 UTC 2013
Hi,
As suggested in the bug report [1] the spec of j.u.Pattern.split()
does not clearly specify what the expected behavior should be for scenario
like a zero-width match is found at the beginning of the input string
(such as whether or not an empty leading string should be included into
the resulting array), worse, the implementation is not consistent as well
(for different input cases, such as "Abc".split(...) vs "AbcEfg".split(...)).
The spec also is not clear regarding what the expected behavior should be
if the size of the input string is 0 [2].
As a reference, Perl.split() function has clear/explicit spec regarding
above use scenario [3].
So the proposed change here is to updatethe spec&impl of Pattern.split() to have
clear specification for above use scanrio, as Perl does
(1) A zero-length input sequence always results zero-length resulting array
(instead of returning a string[] only contains an empty string)
(2) An empty leading substring is included at the beginning of the resulting
array, when there is a positive-width match at the beginning of the input
sequence. A zero-width match at the beginning however never produces such
empty leading substring.
webrev:
http://cr.openjdk.java.net/~sherman/8027645/webrev/
Thanks!
-Sherman
[1] https://bugs.openjdk.java.net/browse/JDK-8027645
[2] https://bugs.openjdk.java.net/browse/JDK-6559590
[3] http://perldoc.perl.org/functions/split.html
btw:the following perl script is used to verify the perl behavior
------------------
$str = "AbcEfgHij";
@substr = split(/(?=\p{Uppercase})/, $str);
#$str = "abc efg hij";
#@substr = split(/ /, $str);
print "split[sz=", scalar @substr, "]=[", join(",", @substr), "]\n";
------------------
More information about the core-libs-dev
mailing list