RFR(m) 2: 8072722: add stream support to Scanner

Chris Hegarty chris.hegarty at oracle.com
Wed Sep 9 08:04:15 UTC 2015


This looks very nice, just a minor spec comment..

> On 9 Sep 2015, at 01:53, Stuart Marks <stuart.marks at oracle.com> wrote:
> 
> 
> 
> On 9/4/15 1:35 AM, Paul Sandoz wrote:
>> Hi Stuart,
>> 
>> This is looking very good.
>> 
>> Just some comments on the tricky aspect related to late-binding of the Stream to the scanner state:
>> 
>> 2652      * <p>This scanner's state should not be modified during execution of the returned
>> 2653      * stream's pipeline. Subsequent calls to any methods on this scanner other than
>> 2654      * {@link #close} and {@link #ioException} may return undefined results or may cause
>> 2655      * undefined effects on the returned stream. The returned stream's source
>> 2656      * {@code Spliterator} is <em>fail-fast</em> and will, on a best-effort
>> 2657      * basis, throw a {@link java.util.ConcurrentModificationException} if such
>> 2658      * modification is detected.
>> 
>> “Subsequent” is a little vague here, does it mean before, during or after stream pipeline execution?
> 
>> Before:
>> 
>> Most methods on scanner may be called before stream pipeline execution, they just adjust the scanner state from which to start from. If close is called it should result in an ISE on pipeline execution.
>> 
>> During:
>> 
>> Either a CME on a best-effort basis if scanner state is modified by a behavioural parameter, or an ISE if close is called.
>> 
>> After:
>> 
>> The scanner is in an indeterminate state. For further operations it needs to be reset, and if the scanner was closed during execution an ISE will result on further operations.
>> 
>> We should try and succinctly talk about all three cases in the specification.
> 
> Mmm, yes, can't get anything vague past you. :-) "Subsequent" should mean anytime after initiation of the pipeline execution. But it would be better to be a bit more explicit. Note that effects go both ways; during pipeline execution, calls to scanner methods might cause the stream to throw CME, and stream operations might cause scanner methods to return unspecified results.
> 
> I think the following covers all of the before, during, and after cases.
> 
> << Scanning starts upon initiation of the terminal stream operation, using the current state of this scanner. Subsequent calls to any methods on this scanner other than {@link #close} and {@link #ioException} may return undefined results or may cause undefined effects on the returned stream. The returned stream's source {@code Spliterator} is <em>fail-fast</em> and will, on a best-effort basis, throw a {@link java.util.ConcurrentModificationException} if any such calls are detected during pipeline execution. >>
> 
> The reset() method will reset the delimiter, locale, and radix, but it can't reset the position in the input, so the scanner effectively cannot be reused after any stream operation has been initiated.
> 
> I'll add the following:
> 
> << After stream execution completes, this scanner is left in an indeterminate state and cannot be reused. >>

I think this note is good, but the webrev/specdiff uses the term ‘pipeline execution’. I think ‘stream execution’ is less likely to cause confusion.

-Chris.


> The closed behavior is separate from CME, so I'll add the following to the text that covers the closing behavior.
> 
> << IllegalStateException is thrown if the scanner has been closed when this method is called, or if this scanner is closed during pipeline execution. >>
> 
> All of the above apply to both the tokens() method and the main findAll() method.
> 
>> You might need to double check FindSpliterator for the before and during cases as i don’t think findPatternInBuffer checks if the scanner is closed, i think it needs an ensureOpen call in tryAdvance.
> 
> Good catch! I've added an ensureOpen() call here and I've also add tests to cover this case.
> 
> Updated webrev:
> 
>  http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.3/ <http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.3/>
> 
> Updated specdiff:
> 
> http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.3/overview-summary.html <http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.3/overview-summary.html>
> 
> s'marks
> 
> 
> 
>> Paul.
>> 
>> On 4 Sep 2015, at 08:17, Stuart Marks <stuart.marks at oracle.com> wrote:
>> 
>>> Please review this update to the Scanner enhancement I proposed a while back. [1]
>>> 
>>> I've updated based on some discussions with Paul Sandoz. The updates since the previous posting are 1) coordination of spec wording from Matcher; 2) addition of ConcurrentModificationException; 3) updating tests to use the streams testing framework; 4) some javadoc cleanups.
>>> 
>>> Bug:
>>> 
>>> 	https://bugs.openjdk.java.net/browse/JDK-8072722
>>> 
>>> Webrev:
>>> 
>>> 	http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.2/
>>> 
>>> Specdiff:
>>> 
>>> 	http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.2/overview-summary.html
>>> 
>>> 
>>> For convenience, I've appended below the description from my earlier post. [1]
>>> 
>>> s'marks
>>> 
>>> [1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/034821.html
>>> 
>>> 
>>> -------
>>> 
>>> 
>>> Scanner is essentially a regular expression matcher that matches over arbitrary input (e.g., from a file) instead of a fixed string like Matcher. Scanner will read and buffer additional input as necessary when looking for matches.
>>> 
>>> This change proposes to add two streams methods:
>>> 
>>> 1. tokens(), which returns a stream of tokens delimited by the Scanner's delimiter. Scanner's default delimiter is whitespace, so the following will collect a list of whitespace-separated words from a file:
>>> 
>>>    try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
>>>        List<String> words = sc.tokens().collect(toList());
>>>    }
>>> 
>>> 2. findAll(pattern), which returns a stream of match results generated by searching the input for the given pattern (either a Pattern or a String). For example, the following will extract from a file all words that are surrounded by "_" characters, such as _foo_ :
>>> 
>>>    try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
>>>        return sc.findAll("_([\\w]+)_")
>>>                 .map(mr -> mr.group(1))
>>>                 .collect(toList());
>>>    }
>>> 
>>> Implementation notes. A Scanner is essentially already an iterator of tokens, so tokens() pretty much just wraps "this" into a stream. The findAll() methods are a wrapper around repeated calls to findWithinHorizon(pattern, 0) with a bit of refactoring to avoid converting the MatchResult to a String prematurely.
>>> 
>>> The tests are pretty straightforward, with some additional cleanups, such as using try-with-resources.
>>> 
>>> -------




More information about the core-libs-dev mailing list