RFR(m) 2: 8072722: add stream support to Scanner

Wed Sep 16 04:48:57 UTC 2015

On 9/10/15 2:12 PM, Xueming Shen wrote:
> I think it might be a "nice to have" for a "fail-fast" effort after the the
> consumer consumed/accepted the result (the second check), but isn't it a bug
> for the consumer to accept any result if there is CME condition occurred
> already?

I'm not sure which spliterator we're talking about at this point, but the issue 
is similar between them. Prior to calling the consumer's accept() method, in 
FindSpliterator, the modCount has previously been asserted to be equal to 
expectedCount. In TokenSpliterator, the expectedCount is refreshed from the 
modCount immediately prior to calling accept(). (This is done because advancing 
the spliterator in this case increments the modCount.)

In both spliterators, then, the expectedCount should be equal to the modCount 
immediately prior to the call to accept(). Also in both spliterators, the 
modCount and expectedCount are compared immediately after accept(), and if they 
aren't equal, CME is thrown.

What this guards against is the accept() method -- really, one of the 
application's lambdas that's been passed to a pipeline operation -- modifying 
the state of the scanner. This only really works in a sequential stream, but 
it's all we've got. (In a parallel stream, I think the element is buffered 
somewhere and is handed to another thread. If that other thread attempts to 
modify the scanner's state, all bets are off because of memory visibility issues.)

Anyway, at least for sequential streams, this check does properly guard against 
the case where somebody modifies the scanner's state from within a pipeline 
operation. There are tests for this too; see ScanTest.streamComodTest().

>>>> It'd be better to initialize expectedCount to modCount in constrocutor?
>>
>> That's how I had it initially, but at Paul Sandoz' suggestion I delayed the
>> initialization to the first call to tryAdvance(). This allows the Scanner's
>> state to be modified after stream creation but before stream pipeline
>> execution. This is the way that Paul's stream code in Matcher works. I'm not
>> sure how important this is. Having Scanner be gratuitously different from
>> Matcher seems like it would be irritating though.
>
> I noticed the spec says "Scanning starts upon initiation of the terminal
> stream operation, using the current state of this scanner..." guess it means
> the "CME" enforcement starts with the "stream operation" starts (a kinda of
> later-initialization). But personally feel it may create a unnecessary
> inconsistent situation, depends on whether or not there is state change
> between the creation of the Stream object and the starting of the stream
> operation. But I'm not a stream > expert :-)

Well, one of my earlier revisions basically said that you can't touch the 
Scanner at all after tokens() or findAll() has been called. This works, but is 
unnessarily restrictive, and it's inconsistent with Paul's approach with 
Matcher.results().

This is pretty easy to see because the constructors for the new spliterators 
simply initialize themselves, but they don't hang onto any state from the 
scanner. The only actual dependence on the state of the scanner starts at the 
first call to tryAdvance(), which is when the first element is actually 
introduced to the stream. It's safe for  the application to change the state of 
the scanner any time up until that point. It does introduce a little bit of 
complexity in that there's an additional state in the expectedCount checking (as 
we've seen) :-). But it does allow a bit more flexibility with the caller's 
handling of the scanner and a stream derived from it.

s'marks