JDK 9 RFR(s): 8150488: add note to Scanner.findAll() regardingpossible infinite streams

Timo Kinnunen timo.kinnunen at gmail.com
Thu Mar 30 15:56:41 UTC 2017


Hi, 

I guess this somewhat contrived example also wouldn’t work? 

		String s = "\\b\\w+|\\G|\\B";
		String t = "Matcher m = Pattern.compile(s).matcher(t);\n";
		Matcher m = Pattern.compile(s).matcher(t);
		while(m.find()) {
			System.out.println("'" + m.group() + "'");
		}
		// Outputs:
		// 'Matcher'
		// ''
		// 'm'
		// ''
		// ''
		// ''
		// 'Pattern'
		// ''
		// 'compile'
		// ''
		// 's'
		// ''
		// ''
		// 'matcher'
		// ''
		// 't'
		// ''
		// ''
		// ''
		// ''



Sent from Mail for Windows 10

From: Xueming Shen
Sent: Thursday, March 30, 2017 05:41
To: core-libs-dev at openjdk.java.net
Subject: Re: JDK 9 RFR(s): 8150488: add note to Scanner.findAll() regardingpossible infinite streams

On 3/29/17, 5:56 PM, Stuart Marks wrote:
> Hi all,
>
> Please review these non-normative textual additions to the 
> Scanner.findAll() method docs. These methods were added earlier in JDK 
> 9; there's a small pitfall if the regex can match zero characters.
>
Stuart,

This might practically put the api itself almost useless? it might be an 
easy task to spot
whether or not it's a 0-width-match-possible regex when the regex is 
simple, but it gets
harder and harder, if not impossible when the regex gets complicated, 
especially consider
the possible use scenario that the use site is embedded deeply inside a 
library implementation.

The alternative is to "fix" it, maybe as what Matcher.find() does, if 
the previous match is
zero-width-match (the fist==last), we step one to the next cursor before 
next try. I know
Scanner.findPatternInBuffer() is setting new region set every time it is 
invoked which makes
it complicated, but I would assume it might be still worth a trying? for 
example, utilize the
"hasNextResult"/matcher.end(). I'm not sure without looking into the 
code, does

while (hasNext(pattern)) {
     next(pattern);
}

have the same issue, when pattern matches 0-width?

Thanks!
-Sherman




> Thanks,
>
> s'marks
>
>
> # HG changeset patch
> # User smarks
> # Date 1490749958 25200
> #      Tue Mar 28 18:12:38 2017 -0700
> # Node ID 6b43c4698752779793d58813f46d3687c17dde75
> # Parent  fb54b256d751ae3191e9cef42ff9f5630931f047
> 8150488: add note to Scanner.findAll() regarding possible infinite 
> streams
> Reviewed-by: XXX
>
> diff -r fb54b256d751 -r 6b43c4698752 
> src/java.base/share/classes/java/util/Scanner.java
> --- a/src/java.base/share/classes/java/util/Scanner.java    Mon Mar 27 
> 15:12:01 2017 -0700
> +++ b/src/java.base/share/classes/java/util/Scanner.java    Tue Mar 28 
> 18:12:38 2017 -0700
> @@ -2808,6 +2808,10 @@
>       * }
>       * }</pre>
>       *
> +     * <p>The pattern must always match at least one character. If 
> the pattern
> +     * can match zero characters, the result will be an infinite stream
> +     * of empty matches.
> +     *
>       * @param pattern the pattern to be matched
>       * @return a sequential stream of match results
>       * @throws NullPointerException if pattern is null
> @@ -2829,6 +2833,11 @@
>       *     scanner.findAll(Pattern.compile(patString))
>       * }</pre>
>       *
> +     * @apiNote
> +     * The pattern must always match at least one character. If the 
> pattern
> +     * can match zero characters, the result will be an infinite stream
> +     * of empty matches.
> +     *
>       * @param patString the pattern string
>       * @return a sequential stream of match results
>       * @throws NullPointerException if patString is null




More information about the core-libs-dev mailing list