RFR(m) 2: 8072722: add stream support to Scanner

Thu Sep 10 21:12:10 UTC 2015

On 09/10/2015 01:22 PM, Stuart Marks wrote:
>
>
> On 9/9/15 3:51 PM, Xueming Shen wrote:
>> One more comment regarding the CME check.
>>
>> 2829                     if (expectedCount != modCount) {
>> 2830                         throw new ConcurrentModificationException();
>> 2831                     }
>>
>> While it definitely serves the purpose of "fail-fast", but given it's a
>> ordered/sequential
>> stream, this condition is always checked in the immediate next round of
>> tryAdvance().
>> Just doubt it really benefits anyone. It's true that it almost costs nothing
>> though.
>
> The stream is initially sequential, but it can be changed to parallel by the application. If this occurs, this spliterator would be wrapped by code that gathers up elements into batches for parallel processing, making the control flow more complicated.
>
> 2826                 // assert expectedCount == modCount
> 2827                 if (findPatternInBuffer(pattern, 0)) { // doesn't increment modCount
> 2828                     cons.accept(matcher.toMatchResult());
> 2829                     if (expectedCount != modCount) {
> 2830                         throw new ConcurrentModificationException();
> 2831                     }
>
> I guess the question is, is it worth having two checks in each call to tryAdvance()? The first check has to be there in order to initialize modCount the first time (see also below), and since you have modCount, you might as well check it if it's nonnegative.
>
> The check after the call to the Consumer might seem unnecessary, but note that there is no guarantee that tryAdvance() will be called again if the stream decides to terminate early. (Consider short-circuiting behavior of findFirst() and friends.) It's easy to see that this "extra" check is a guard against the Consumer modifying the scanner's state, which I think is fairly important.
>
>> On 09/09/2015 03:35 PM, Xueming Shen wrote:
>>> Hi Stuart,
>>>
>>> Can't modCount be increased to negative in extreme case?
>>>
>>> 2705             if (expectedCount>= 0&&  expectedCount != modCount) {
>>> 2706                 throw new ConcurrentModificationException();
>>> 2707             }
>
> (This is in TokenSpliterator.)
>
> Yes, as tokens are processed, modCount will be incremented continually and overflow will eventually occur, causing modCount to wrap around to negative. The expectedCount value is continually reinitialized from modCount, so when this happens, expectedCount will go negative as well. This will disable the CME check at the top of tryAdvance(). If hasNext() is true, though, expectedCount is reinitialized from modCount and it's checked unconditionally after the Consumer call, so that CME check won't be affected. Similar things can occur if modCount has wrapped around to negative before the stream processing starts.
>
> Bottom line is that when modCount is negative, it will bypass half the CME checks in TokenSpliterator. But some checking will still be done.
>
> The situation with FindSpliterator is a bit different. The modCount value isn't incremented while advancing the spliterator, but there is still the possibility that modCount could be negative by the time stream processing starts. If this occurs, each call to tryAdvance() will be treated as the initial case and expectedCount will be reinitialized from modCount. This means that, as before, some CME checking could be bypassed. The check after the Consumer call will still work properly though.
>
> Looks like the apparently redundant CME check after the Consumer call is helping out after all. :-)
>
> It would be nice if this were cleaner, but it still meets the criterion of "best effort". It doesn't appear to fail spuriously, either. I suppose it would halve the number of error cases if expectedCount were compared against its sentinel value of -1 instead of <= 0, but I haven't thought this through. Is it worth it? As things stand, CME checks will be missed only happen after a billion calls on the scanner, and this is only significant if the application starts modifying the scanner illegally at that point.

I think it might be a "nice to have" for a "fail-fast" effort after the the consumer
consumed/accepted the result (the second check), but isn't it a bug for the consumer
to accept any result if there is CME condition occurred already?

>
>>> It'd be better to initialize expectedCount to modCount in constrocutor?
>
> That's how I had it initially, but at Paul Sandoz' suggestion I delayed the initialization to the first call to tryAdvance(). This allows the Scanner's state to be modified after stream creation but before stream pipeline execution. This is the way that Paul's stream code in Matcher works. I'm not sure how important this is. Having Scanner be gratuitously different from Matcher seems like it would be irritating though.
>

I noticed the spec says "Scanning starts upon initiation of the terminal stream operation, using the
current state of this scanner..." guess it means the "CME" enforcement starts with the "stream
operation" starts (a kinda of later-initialization). But personally feel it may create a unnecessary
inconsistent situation, depends on whether or not there is state change between the creation of
the Stream object and the starting of the stream operation. But I'm not a stream expert :-)

-Sherman