RFR: JDK-8027645: Pattern.split() with positive lookahead
Xueming Shen
xueming.shen at oracle.com
Mon Nov 11 22:05:53 UTC 2013
Alan, Paul,
My apology, it appears I forgot my "fastpath" in String.split(String, int) and
the fact that it also duplicates most of the spec of Pattern.split(). The webrev
has been updated to close the loophole.
http://cr.openjdk.java.net/~sherman/8027645/webrev
thanks!
-Sherman
On 11/07/2013 10:59 AM, Xueming Shen wrote:
> Hi,
>
> As suggested in the bug report [1] the spec of j.u.Pattern.split()
> does not clearly specify what the expected behavior should be for scenario
> like a zero-width match is found at the beginning of the input string
> (such as whether or not an empty leading string should be included into
> the resulting array), worse, the implementation is not consistent as well
> (for different input cases, such as "Abc".split(...) vs "AbcEfg".split(...)).
>
> The spec also is not clear regarding what the expected behavior should be
> if the size of the input string is 0 [2].
>
> As a reference, Perl.split() function has clear/explicit spec regarding
> above use scenario [3].
>
> So the proposed change here is to updatethe spec&impl of Pattern.split() to have
> clear specification for above use scanrio, as Perl does
>
> (1) A zero-length input sequence always results zero-length resulting array
> (instead of returning a string[] only contains an empty string)
> (2) An empty leading substring is included at the beginning of the resulting
> array, when there is a positive-width match at the beginning of the input
> sequence. A zero-width match at the beginning however never produces such
> empty leading substring.
>
> webrev:
> http://cr.openjdk.java.net/~sherman/8027645/webrev/
>
> Thanks!
> -Sherman
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8027645
> [2] https://bugs.openjdk.java.net/browse/JDK-6559590
> [3] http://perldoc.perl.org/functions/split.html
>
> btw:the following perl script is used to verify the perl behavior
> ------------------
> $str = "AbcEfgHij";
> @substr = split(/(?=\p{Uppercase})/, $str);
> #$str = "abc efg hij";
> #@substr = split(/ /, $str);
> print "split[sz=", scalar @substr, "]=[", join(",", @substr), "]\n";
> ------------------
More information about the core-libs-dev
mailing list