RFR(m) 2: 8072722: add stream support to Scanner
Stuart Marks
stuart.marks at oracle.com
Thu Sep 10 20:22:27 UTC 2015
On 9/9/15 3:51 PM, Xueming Shen wrote:
> One more comment regarding the CME check.
>
> 2829 if (expectedCount != modCount) {
> 2830 throw new ConcurrentModificationException();
> 2831 }
>
> While it definitely serves the purpose of "fail-fast", but given it's a
> ordered/sequential
> stream, this condition is always checked in the immediate next round of
> tryAdvance().
> Just doubt it really benefits anyone. It's true that it almost costs nothing
> though.
The stream is initially sequential, but it can be changed to parallel by the
application. If this occurs, this spliterator would be wrapped by code that
gathers up elements into batches for parallel processing, making the control
flow more complicated.
2826 // assert expectedCount == modCount
2827 if (findPatternInBuffer(pattern, 0)) { // doesn't increment
modCount
2828 cons.accept(matcher.toMatchResult());
2829 if (expectedCount != modCount) {
2830 throw new ConcurrentModificationException();
2831 }
I guess the question is, is it worth having two checks in each call to
tryAdvance()? The first check has to be there in order to initialize modCount
the first time (see also below), and since you have modCount, you might as well
check it if it's nonnegative.
The check after the call to the Consumer might seem unnecessary, but note that
there is no guarantee that tryAdvance() will be called again if the stream
decides to terminate early. (Consider short-circuiting behavior of findFirst()
and friends.) It's easy to see that this "extra" check is a guard against the
Consumer modifying the scanner's state, which I think is fairly important.
> On 09/09/2015 03:35 PM, Xueming Shen wrote:
>> Hi Stuart,
>>
>> Can't modCount be increased to negative in extreme case?
>>
>> 2705 if (expectedCount>= 0&& expectedCount != modCount) {
>> 2706 throw new ConcurrentModificationException();
>> 2707 }
(This is in TokenSpliterator.)
Yes, as tokens are processed, modCount will be incremented continually and
overflow will eventually occur, causing modCount to wrap around to negative. The
expectedCount value is continually reinitialized from modCount, so when this
happens, expectedCount will go negative as well. This will disable the CME check
at the top of tryAdvance(). If hasNext() is true, though, expectedCount is
reinitialized from modCount and it's checked unconditionally after the Consumer
call, so that CME check won't be affected. Similar things can occur if modCount
has wrapped around to negative before the stream processing starts.
Bottom line is that when modCount is negative, it will bypass half the CME
checks in TokenSpliterator. But some checking will still be done.
The situation with FindSpliterator is a bit different. The modCount value isn't
incremented while advancing the spliterator, but there is still the possibility
that modCount could be negative by the time stream processing starts. If this
occurs, each call to tryAdvance() will be treated as the initial case and
expectedCount will be reinitialized from modCount. This means that, as before,
some CME checking could be bypassed. The check after the Consumer call will
still work properly though.
Looks like the apparently redundant CME check after the Consumer call is helping
out after all. :-)
It would be nice if this were cleaner, but it still meets the criterion of "best
effort". It doesn't appear to fail spuriously, either. I suppose it would halve
the number of error cases if expectedCount were compared against its sentinel
value of -1 instead of <= 0, but I haven't thought this through. Is it worth it?
As things stand, CME checks will be missed only happen after a billion calls on
the scanner, and this is only significant if the application starts modifying
the scanner illegally at that point.
>> It'd be better to initialize expectedCount to modCount in constrocutor?
That's how I had it initially, but at Paul Sandoz' suggestion I delayed the
initialization to the first call to tryAdvance(). This allows the Scanner's
state to be modified after stream creation but before stream pipeline execution.
This is the way that Paul's stream code in Matcher works. I'm not sure how
important this is. Having Scanner be gratuitously different from Matcher seems
like it would be irritating though.
s'marks
More information about the core-libs-dev
mailing list