RFR(m) 2: 8072722: add stream support to Scanner

Stuart Marks stuart.marks at oracle.com
Wed Sep 16 23:08:09 UTC 2015



On 9/16/15 8:43 AM, Xueming Shen wrote:

> I'm talking about the check "immediately" prior to the call to accept(). It
> will not function after the modCount tips over to the negative int value,
> because the "expectedCount >=0" check.
>
> Consider the use scenario that the Scanner is on top of an endless input
> stream, you have a token stream on top of it. The check before the
> "accept(token" will not be performed until the expectedCount/modCount tips
> back to positive value again from the negative, then off, then on... During
> the off period (it will take a while from negative back to positive), the
> stream will just work fine to feed the accept() the "next" token even if
> there is another thread keeps "stealing" tokens from the same scanner, if the
> timing is right.  Looks like not really a "fail-fast" in this scenario.

Right, after modCount wraps around to negative, the CME checking becomes 
dysfunctional. It doesn't do any harm, but it ceases to perform proper checking. 
This is kind of a corner case but ... well I admit I did have to do a fair bit 
of puzzling to figure out what the behavior would be and to prove to myself that 
it was benign.

> This can be "easily" addressed, if you have a separate boolean field such as
> "initialized". The code can look like below in tryAdvance(...)

[edited per your subsequent message]

>
>      if (!initialized) {
>          expectedCount = modCount;
>          initialized = true;
>      }
>      if (expectedCount != modCount) {
>          throw new CME();
>      }
>      ...
>
> Well, if you think this is an unlikely use scenario and the intention of the
> check/guard here is mainly to prevent the wrong doing within the pipe
> operation, then it might not worth the extra field, and I'm fine with the
> latest webrev.

The cost of the additional field is negligible. I haven't written out the code 
but I suspect that an explicit "initialized" field will be easier to reason 
about, certainly easier to understand than the behavior that occurs if modCount 
wraps around to negative.

Note that this also applies to Matcher, although it's less likely since 
Matcher's input is a CharSequence instead of an indefinite-sized source such as 
a file or an input stream. In talking to Paul Sandoz about this (author of the 
streams stuff for Matcher) he felt it was important to keep the behaviors of 
Matcher and Scanner consistent.

But this has dragged out somewhat and I don't really want to add Matcher changes 
to this changeset. How about I do the following:

1) I'll push the latest webrev as it stands.

2) I'll file a separate bug to clean up Scanner's and Matcher's spliterators' 
modCount checking to avoid the overflow issue.

3) I'll fix all the spliterators at the same time.

How does that sound?

s'marks



More information about the core-libs-dev mailing list