RFR: JDK-8027645: Pattern.split() with positive lookahead

Xueming Shen xueming.shen at oracle.com
Thu Nov 7 18:59:36 UTC 2013


Hi,

As suggested in the bug report [1] the spec of j.u.Pattern.split()
does not clearly specify what the expected behavior should be for scenario
like a zero-width match is found at the beginning of the input string
(such as whether or not an empty leading string should be included into
the resulting array), worse, the implementation is not consistent as well
(for different input cases, such as "Abc".split(...) vs "AbcEfg".split(...)).

The spec also is not clear regarding what the expected behavior should be
if the size of the input string is 0 [2].

As a reference, Perl.split() function has clear/explicit spec regarding
above use scenario [3].

So the proposed change here is to updatethe spec&impl of Pattern.split() to have
clear specification for above use scanrio, as Perl does

(1) A zero-length input sequence always results zero-length resulting array
     (instead of returning a string[] only contains an empty string)
(2) An empty leading substring is included at the beginning of the resulting
     array, when there is a positive-width match at the beginning of the input
     sequence. A zero-width match at the beginning however never produces such
     empty leading substring.

webrev:
http://cr.openjdk.java.net/~sherman/8027645/webrev/

Thanks!
-Sherman

[1] https://bugs.openjdk.java.net/browse/JDK-8027645
[2] https://bugs.openjdk.java.net/browse/JDK-6559590
[3] http://perldoc.perl.org/functions/split.html

btw:the following perl script is used to verify the perl behavior
------------------
$str = "AbcEfgHij";
@substr = split(/(?=\p{Uppercase})/, $str);
#$str = "abc efg  hij";
#@substr = split(/ /, $str);
print "split[sz=", scalar @substr, "]=[", join(",", @substr), "]\n";
------------------



More information about the core-libs-dev mailing list