RFR(m) 2: 8072722: add stream support to Scanner

Stuart Marks stuart.marks at oracle.com
Fri Sep 4 06:17:59 UTC 2015


Please review this update to the Scanner enhancement I proposed a while back. [1]

I've updated based on some discussions with Paul Sandoz. The updates since the 
previous posting are 1) coordination of spec wording from Matcher; 2) addition 
of ConcurrentModificationException; 3) updating tests to use the streams testing 
framework; 4) some javadoc cleanups.

Bug:

	https://bugs.openjdk.java.net/browse/JDK-8072722

Webrev:

	http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.2/

Specdiff:

	http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.2/overview-summary.html


For convenience, I've appended below the description from my earlier post. [1]

s'marks

[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/034821.html


-------


Scanner is essentially a regular expression matcher that matches over arbitrary 
input (e.g., from a file) instead of a fixed string like Matcher. Scanner will 
read and buffer additional input as necessary when looking for matches.

This change proposes to add two streams methods:

1. tokens(), which returns a stream of tokens delimited by the Scanner's 
delimiter. Scanner's default delimiter is whitespace, so the following will 
collect a list of whitespace-separated words from a file:

     try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
         List<String> words = sc.tokens().collect(toList());
     }

2. findAll(pattern), which returns a stream of match results generated by 
searching the input for the given pattern (either a Pattern or a String). For 
example, the following will extract from a file all words that are surrounded by 
"_" characters, such as _foo_ :

     try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
         return sc.findAll("_([\\w]+)_")
                  .map(mr -> mr.group(1))
                  .collect(toList());
     }

Implementation notes. A Scanner is essentially already an iterator of tokens, so 
tokens() pretty much just wraps "this" into a stream. The findAll() methods are 
a wrapper around repeated calls to findWithinHorizon(pattern, 0) with a bit of 
refactoring to avoid converting the MatchResult to a String prematurely.

The tests are pretty straightforward, with some additional cleanups, such as 
using try-with-resources.

-------



More information about the core-libs-dev mailing list