8069325: Pattern.splitAsStream does not return input if it is empty and there is no match
Hi, http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8069325-Pattern-splitAsStream-e... This patch fixes an edge case in Pattern.splitAsStream for matching against an empty input string, which deviated from the behaviour of Pattern.split. When there are no matches a stream containing the input string should be returned rather than an empty stream. -- I have kept compatibility with Pattern.split(String ) but i noticed another an edge case. What should the following return: Pattern.compile("").split("") [] or [""]? There is a zero-width match at the beginning and an empty remaining segment both of which should be discarded, as such i would expect the result to be [] rather than as [""], as currently produced result. If people agree that this is an issue i suggest we log a new one independent of fixing 8069325. Thanks, Paul.
On 1/20/15 8:17 AM, Paul Sandoz wrote:
Hi,
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8069325-Pattern-splitAsStream-e...
This patch fixes an edge case in Pattern.splitAsStream for matching against an empty input string, which deviated from the behaviour of Pattern.split. When there are no matches a stream containing the input string should be returned rather than an empty stream.
--
I have kept compatibility with Pattern.split(String ) but i noticed another an edge case.
What should the following return:
Pattern.compile("").split("")
[] or [""]?
There is a zero-width match at the beginning and an empty remaining segment both of which should be discarded, as such i would expect the result to be [] rather than as [""], as currently produced result.
It may depend on how the "trailing empty string" gets interpreted. Is it possible to interpret it as the empty string is the result of the "substring from the beginning 0-width match and the end of the input sequence", any thing after that is "trailing"? It would be clear if the spec explicitly said, the result of splitting an empty input is an empty string. I would assume someone, mostly the user of String.split(), will get hit by this "incompatible" change. -Sherman
If people agree that this is an issue i suggest we log a new one independent of fixing 8069325.
Thanks, Paul.
On Jan 20, 2015, at 5:35 PM, Xueming Shen <xueming.shen@oracle.com> wrote:
On 1/20/15 8:17 AM, Paul Sandoz wrote:
Hi,
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8069325-Pattern-splitAsStream-e...
This patch fixes an edge case in Pattern.splitAsStream for matching against an empty input string, which deviated from the behaviour of Pattern.split. When there are no matches a stream containing the input string should be returned rather than an empty stream.
--
I have kept compatibility with Pattern.split(String ) but i noticed another an edge case.
What should the following return:
Pattern.compile("").split("")
[] or [""]?
There is a zero-width match at the beginning and an empty remaining segment both of which should be discarded, as such i would expect the result to be [] rather than as [""], as currently produced result.
It may depend on how the "trailing empty string" gets interpreted. Is it possible to interpret it as the empty string is the result of the "substring from the beginning 0-width match and the end of the input sequence", any thing after that is "trailing"?
Seems a stretch to me. Consider the following which returns []: Pattern.compile("x").split("x"); Replace "x" with "" and intuitively i would expect the same behaviour.
It would be clear if the spec explicitly said, the result of splitting an empty input is an empty string.
I would assume someone, mostly the user of String.split(), will get hit by this "incompatible" change.
Yeah, there is some risk in that. Paul.
On Jan 20, 2015, at 5:35 PM, Xueming Shen <xueming.shen@oracle.com> wrote:
On 1/20/15 8:17 AM, Paul Sandoz wrote:
Hi,
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8069325-Pattern-splitAsStream-e...
This patch fixes an edge case in Pattern.splitAsStream for matching against an empty input string, which deviated from the behaviour of Pattern.split. When there are no matches a stream containing the input string should be returned rather than an empty stream.
We got side-tracked by the discussion below. Did you have a chance to look at the patch? Thanks, Paul.
--
I have kept compatibility with Pattern.split(String ) but i noticed another an edge case.
What should the following return:
Pattern.compile("").split("")
[] or [""]?
There is a zero-width match at the beginning and an empty remaining segment both of which should be discarded, as such i would expect the result to be [] rather than as [""], as currently produced result.
It may depend on how the "trailing empty string" gets interpreted. Is it possible to interpret it as the empty string is the result of the "substring from the beginning 0-width match and the end of the input sequence", any thing after that is "trailing"?
It would be clear if the spec explicitly said, the result of splitting an empty input is an empty string.
I would assume someone, mostly the user of String.split(), will get hit by this "incompatible" change.
-Sherman
If people agree that this is an issue i suggest we log a new one independent of fixing 8069325.
Thanks, Paul.
Hi Paul, my apology for taking so long :-) The change looks fine. With regarding the edge case "".split(""), I am fine with the idea of discarding the resulting empty string as one trailing empty string. -Sherman On 01/20/2015 08:17 AM, Paul Sandoz wrote:
Hi,
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8069325-Pattern-splitAsStream-e...
This patch fixes an edge case in Pattern.splitAsStream for matching against an empty input string, which deviated from the behaviour of Pattern.split. When there are no matches a stream containing the input string should be returned rather than an empty stream.
--
I have kept compatibility with Pattern.split(String ) but i noticed another an edge case.
What should the following return:
Pattern.compile("").split("")
[] or [""]?
There is a zero-width match at the beginning and an empty remaining segment both of which should be discarded, as such i would expect the result to be [] rather than as [""], as currently produced result.
If people agree that this is an issue i suggest we log a new one independent of fixing 8069325.
Thanks, Paul.
On Feb 12, 2015, at 11:53 PM, Xueming Shen <xueming.shen@oracle.com> wrote:
Hi Paul, my apology for taking so long :-)
No problem, thanks for putting up with me hassling you :-)
The change looks fine.
Thanks.
With regarding the edge case "".split(""), I am fine with the idea of discarding the resulting empty string as one trailing empty string.
Ok, something to tackle later on a very rainy day, i will log a P5 issue. Paul.
participants (2)
-
Paul Sandoz
-
Xueming Shen