RFR(m) 2: 8072722: add stream support to Scanner

Thu Sep 10 20:22:27 UTC 2015

On 9/9/15 3:51 PM, Xueming Shen wrote:
> One more comment regarding the CME check.
>
> 2829                     if (expectedCount != modCount) {
> 2830                         throw new ConcurrentModificationException();
> 2831                     }
>
> While it definitely serves the purpose of "fail-fast", but given it's a
> ordered/sequential
> stream, this condition is always checked in the immediate next round of
> tryAdvance().
> Just doubt it really benefits anyone. It's true that it almost costs nothing
> though.

The stream is initially sequential, but it can be changed to parallel by the 
application. If this occurs, this spliterator would be wrapped by code that 
gathers up elements into batches for parallel processing, making the control 
flow more complicated.

2826                 // assert expectedCount == modCount
2827                 if (findPatternInBuffer(pattern, 0)) { // doesn't increment 
modCount
2828                     cons.accept(matcher.toMatchResult());
2829                     if (expectedCount != modCount) {
2830                         throw new ConcurrentModificationException();
2831                     }

I guess the question is, is it worth having two checks in each call to 
tryAdvance()? The first check has to be there in order to initialize modCount 
the first time (see also below), and since you have modCount, you might as well 
check it if it's nonnegative.

The check after the call to the Consumer might seem unnecessary, but note that 
there is no guarantee that tryAdvance() will be called again if the stream 
decides to terminate early. (Consider short-circuiting behavior of findFirst() 
and friends.) It's easy to see that this "extra" check is a guard against the 
Consumer modifying the scanner's state, which I think is fairly important.

> On 09/09/2015 03:35 PM, Xueming Shen wrote:
>> Hi Stuart,
>>
>> Can't modCount be increased to negative in extreme case?
>>
>> 2705             if (expectedCount>= 0&&  expectedCount != modCount) {
>> 2706                 throw new ConcurrentModificationException();
>> 2707             }

(This is in TokenSpliterator.)

Yes, as tokens are processed, modCount will be incremented continually and 
overflow will eventually occur, causing modCount to wrap around to negative. The 
expectedCount value is continually reinitialized from modCount, so when this 
happens, expectedCount will go negative as well. This will disable the CME check 
at the top of tryAdvance(). If hasNext() is true, though, expectedCount is 
reinitialized from modCount and it's checked unconditionally after the Consumer 
call, so that CME check won't be affected. Similar things can occur if modCount 
has wrapped around to negative before the stream processing starts.

Bottom line is that when modCount is negative, it will bypass half the CME 
checks in TokenSpliterator. But some checking will still be done.

The situation with FindSpliterator is a bit different. The modCount value isn't 
incremented while advancing the spliterator, but there is still the possibility 
that modCount could be negative by the time stream processing starts. If this 
occurs, each call to tryAdvance() will be treated as the initial case and 
expectedCount will be reinitialized from modCount. This means that, as before, 
some CME checking could be bypassed. The check after the Consumer call will 
still work properly though.

Looks like the apparently redundant CME check after the Consumer call is helping 
out after all. :-)

It would be nice if this were cleaner, but it still meets the criterion of "best 
effort". It doesn't appear to fail spuriously, either. I suppose it would halve 
the number of error cases if expectedCount were compared against its sentinel 
value of -1 instead of <= 0, but I haven't thought this through. Is it worth it? 
As things stand, CME checks will be missed only happen after a billion calls on 
the scanner, and this is only significant if the application starts modifying 
the scanner illegally at that point.

>> It'd be better to initialize expectedCount to modCount in constrocutor?

That's how I had it initially, but at Paul Sandoz' suggestion I delayed the 
initialization to the first call to tryAdvance(). This allows the Scanner's 
state to be modified after stream creation but before stream pipeline execution. 
This is the way that Paul's stream code in Matcher works. I'm not sure how 
important this is. Having Scanner be gratuitously different from Matcher seems 
like it would be irritating though.

s'marks