RFR(m) 2: 8072722: add stream support to Scanner
Paul Sandoz
paul.sandoz at oracle.com
Fri Sep 4 08:35:01 UTC 2015
Hi Stuart,
This is looking very good.
Just some comments on the tricky aspect related to late-binding of the Stream to the scanner state:
2652 * <p>This scanner's state should not be modified during execution of the returned
2653 * stream's pipeline. Subsequent calls to any methods on this scanner other than
2654 * {@link #close} and {@link #ioException} may return undefined results or may cause
2655 * undefined effects on the returned stream. The returned stream's source
2656 * {@code Spliterator} is <em>fail-fast</em> and will, on a best-effort
2657 * basis, throw a {@link java.util.ConcurrentModificationException} if such
2658 * modification is detected.
“Subsequent” is a little vague here, does it mean before, during or after stream pipeline execution?
Before:
Most methods on scanner may be called before stream pipeline execution, they just adjust the scanner state from which to start from. If close is called it should result in an ISE on pipeline execution.
During:
Either a CME on a best-effort basis if scanner state is modified by a behavioural parameter, or an ISE if close is called.
After:
The scanner is in an indeterminate state. For further operations it needs to be reset, and if the scanner was closed during execution an ISE will result on further operations.
We should try and succinctly talk about all three cases in the specification.
You might need to double check FindSpliterator for the before and during cases as i don’t think findPatternInBuffer checks if the scanner is closed, i think it needs an ensureOpen call in tryAdvance.
Paul.
On 4 Sep 2015, at 08:17, Stuart Marks <stuart.marks at oracle.com> wrote:
> Please review this update to the Scanner enhancement I proposed a while back. [1]
>
> I've updated based on some discussions with Paul Sandoz. The updates since the previous posting are 1) coordination of spec wording from Matcher; 2) addition of ConcurrentModificationException; 3) updating tests to use the streams testing framework; 4) some javadoc cleanups.
>
> Bug:
>
> https://bugs.openjdk.java.net/browse/JDK-8072722
>
> Webrev:
>
> http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.2/
>
> Specdiff:
>
> http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.2/overview-summary.html
>
>
> For convenience, I've appended below the description from my earlier post. [1]
>
> s'marks
>
> [1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/034821.html
>
>
> -------
>
>
> Scanner is essentially a regular expression matcher that matches over arbitrary input (e.g., from a file) instead of a fixed string like Matcher. Scanner will read and buffer additional input as necessary when looking for matches.
>
> This change proposes to add two streams methods:
>
> 1. tokens(), which returns a stream of tokens delimited by the Scanner's delimiter. Scanner's default delimiter is whitespace, so the following will collect a list of whitespace-separated words from a file:
>
> try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
> List<String> words = sc.tokens().collect(toList());
> }
>
> 2. findAll(pattern), which returns a stream of match results generated by searching the input for the given pattern (either a Pattern or a String). For example, the following will extract from a file all words that are surrounded by "_" characters, such as _foo_ :
>
> try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
> return sc.findAll("_([\\w]+)_")
> .map(mr -> mr.group(1))
> .collect(toList());
> }
>
> Implementation notes. A Scanner is essentially already an iterator of tokens, so tokens() pretty much just wraps "this" into a stream. The findAll() methods are a wrapper around repeated calls to findWithinHorizon(pattern, 0) with a bit of refactoring to avoid converting the MatchResult to a String prematurely.
>
> The tests are pretty straightforward, with some additional cleanups, such as using try-with-resources.
>
> -------
More information about the core-libs-dev
mailing list