RFR(m): 8072722: add stream support to Scanner
Stuart Marks
stuart.marks at oracle.com
Tue Aug 4 21:19:29 UTC 2015
Hi all,
Please review this API enhancement that adds streams support to java.util.Scanner.
Scanner is essentially a regular expression matcher that matches over arbitrary
input (e.g., from a file) instead of a fixed string like Matcher. Scanner will
read and buffer additional input as necessary when looking for matches.
This change proposes to add two streams methods:
1. tokens(), which returns a stream of tokens delimited by the Scanner's
delimiter. Scanner's default delimiter is whitespace, so the following will
collect a list of whitespace-separated words from a file:
try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
List<String> words = sc.tokens().collect(toList());
}
2. findAll(pattern), which returns a stream of match results generated by
searching the input for the given pattern (either a Pattern or a String). For
example, the following will extract from a file all words that are surrounded by
"_" characters, such as _foo_ :
try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
return sc.findAll("_([\\w]+)_")
.map(mr -> mr.group(1))
.collect(toList());
}
Implementation notes. A Scanner is essentially already an iterator of tokens, so
tokens() pretty much just wraps "this" into a stream. The findAll() methods are
a wrapper around repeated calls to findWithinHorizon(pattern, 0) with a bit of
refactoring to avoid converting the MatchResult to a String prematurely.
The tests are pretty straightforward, with some additional cleanups, such as
using try-with-resources.
Bug:
https://bugs.openjdk.java.net/browse/JDK-8072722
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.0/
Specdiff:
http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.0/overview-summary.html
Thanks,
s'marks
More information about the core-libs-dev
mailing list