RFR JDK-8139414: java.util.Scanner hasNext() returns true, next() throws NoSuchElementException
Xueming Shen
xueming.shen at oracle.com
Tue Jun 14 22:34:43 UTC 2016
Thanks Stuart!
webrev has been updated accordingly based on your suggestion.
http://cr.openjdk.java.net/~sherman/8072582_8139414/webrev
-Sherman
On 6/14/16, 1:22 PM, Stuart Marks wrote:
> Hi Sherman,
>
> The fix looks good.
>
> It would be helpful if the test for 8072582 generated the string
> instead of using a literal that's more than 1K long. The exact length
> is significant because Scanner's default buffer size is 1024, so the
> delimiter has to straddle the buffer boundary.
>
> The 8139414 test generates its string, which is nicer. In this case
> the test is taken from the bug report, but in my opinion the addition
> of the "boundary" variable (which is the string ";") makes things more
> obscure. I'd suggest inlining it.
>
> For both test cases it might be helpful to have a little utility that
> appends n copies of a char to a StringBuilder.
>
> Thanks,
>
> s'marks
>
> On 6/8/16 1:57 PM, Xueming Shen wrote:
>> Hi,
>>
>> Please help review the change for
>>
>> JDK-8139414: java.util.Scanner hasNext() returns true, next() throws
>> NoSuchElementException
>> JDK-8072582: Scanner delimits incorrectly when delimiter spans a
>> buffer boundary
>>
>> issue: https://bugs.openjdk.java.net/browse/JDK-8139414
>> https://bugs.openjdk.java.net/browse/JDK-8072582
>> webrev: http://cr.openjdk.java.net/~sherman/8072582_8139414/webrev
>>
>> In both cases the delimiter pattern is a kinda of "alternation" regex
>> construct
>> which can "match" the existing characters at the end of the internal
>> buffer as
>> delimiters, AND can extend to match more delimiters if more input is
>> available.
>>
>> In issue JDK-8139414, the hasNext() uses hasTokenInBuffer() to find
>> the delimiters
>> "-;". It does not go beyond the boundary to check if there is more
>> character, such
>> as "-" that can also be part of the delimiters). So hasNext() returns
>> true with the
>> assumption that there is a token because there is/are more character
>> after "-;".
>> But method getCompleteTokenInBuffer() (used by next()
>> implementation), which
>> has the logic to check beyond the boundary even the delimiter pattern
>> already
>> has a match. It matches "-;-" as the delimiters and then find no
>> "next" (null)
>> after
>> that.
>>
>> Similar for issue 8072582. This time the getCompleteTokenInBuffer
>> does not
>> use the "lookingAt() and beyond" logic for the second delimiters,
>> which triggers
>> problem when the delimiter pattern has different match result
>> (beginning position)
>> for cases within boundary and beyond boundary.
>>
>> The proposed fix here is to always check if there is more input when
>> match
>> delimiters at the internal buffer boundary.
>>
>> Thanks,
>> Sherman
More information about the core-libs-dev
mailing list