RFR(m) 2: 8072722: add stream support to Scanner

Fri Sep 4 21:38:46 UTC 2015

Hi Tagir,

Well spotted! I'll fix this.

Thanks,

s'marks

On 9/3/15 11:57 PM, Tagir F. Valeev wrote:
> Hello!
>
> In tokens() JavaDoc:
>
>       * <pre>{@code
>       * List<String> result = new Scanner("abc,def,,ghi").useDelimiter(",").
>       *     .tokens().collect(Collectors.toList());
>       * }</pre>
>
> Here the dot is duplicated after "useDelimiter()" and before
> "tokens()" on the next line.
>
> Everything else looks good to me.
>
> With best regards,
> Tagir Valeev.
>
> SM> Please review this update to the Scanner enhancement I proposed a while back. [1]
>
> SM> I've updated based on some discussions with Paul Sandoz. The updates since the
> SM> previous posting are 1) coordination of spec wording from Matcher; 2) addition
> SM> of ConcurrentModificationException; 3) updating tests to use the streams testing
> SM> framework; 4) some javadoc cleanups.
>
> SM> Bug:
>
> SM>         https://bugs.openjdk.java.net/browse/JDK-8072722
>
> SM> Webrev:
>
> SM>         http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.2/
>
> SM> Specdiff:
>
> SM>
> SM> http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.2/overview-summary.html
>
>
> SM> For convenience, I've appended below the description from my earlier post. [1]
>
> SM> s'marks
>
> SM> [1]
> SM> http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/034821.html
>
>
> SM> -------
>
>
> SM> Scanner is essentially a regular expression matcher that matches over arbitrary
> SM> input (e.g., from a file) instead of a fixed string like Matcher. Scanner will
> SM> read and buffer additional input as necessary when looking for matches.
>
> SM> This change proposes to add two streams methods:
>
> SM> 1. tokens(), which returns a stream of tokens delimited by the Scanner's
> SM> delimiter. Scanner's default delimiter is whitespace, so the following will
> SM> collect a list of whitespace-separated words from a file:
>
> SM>      try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
> SM>          List<String> words = sc.tokens().collect(toList());
> SM>      }
>
> SM> 2. findAll(pattern), which returns a stream of match results generated by
> SM> searching the input for the given pattern (either a Pattern or a String). For
> SM> example, the following will extract from a file all words that are surrounded by
> SM> "_" characters, such as _foo_ :
>
> SM>      try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
> SM>          return sc.findAll("_([\\w]+)_")
> SM>                   .map(mr -> mr.group(1))
> SM>                   .collect(toList());
> SM>      }
>
> SM> Implementation notes. A Scanner is essentially already an iterator of tokens, so
> SM> tokens() pretty much just wraps "this" into a stream. The findAll() methods are
> SM> a wrapper around repeated calls to findWithinHorizon(pattern, 0) with a bit of
> SM> refactoring to avoid converting the MatchResult to a String prematurely.
>
> SM> The tests are pretty straightforward, with some additional cleanups, such as
> SM> using try-with-resources.
>
> SM> -------
>