RFR JDK-8139414: java.util.Scanner hasNext() returns true, next() throws NoSuchElementException

Tue Jun 14 22:46:42 UTC 2016

Great, the test looks much better now.

Thanks!

s'marks

On 6/14/16 3:34 PM, Xueming Shen wrote:
> Thanks Stuart!
>
> webrev has been updated accordingly based on your suggestion.
>
> http://cr.openjdk.java.net/~sherman/8072582_8139414/webrev
>
> -Sherman
>
> On 6/14/16, 1:22 PM, Stuart Marks wrote:
>> Hi Sherman,
>>
>> The fix looks good.
>>
>> It would be helpful if the test for 8072582 generated the string instead of
>> using a literal that's more than 1K long. The exact length is significant
>> because Scanner's default buffer size is 1024, so the delimiter has to
>> straddle the buffer boundary.
>>
>> The 8139414 test generates its string, which is nicer. In this case the test
>> is taken from the bug report, but in my opinion the addition of the "boundary"
>> variable (which is the string ";") makes things more obscure. I'd suggest
>> inlining it.
>>
>> For both test cases it might be helpful to have a little utility that appends
>> n copies of a char to a StringBuilder.
>>
>> Thanks,
>>
>> s'marks
>>
>> On 6/8/16 1:57 PM, Xueming Shen wrote:
>>> Hi,
>>>
>>> Please help review the change for
>>>
>>> JDK-8139414: java.util.Scanner hasNext() returns true, next() throws
>>> NoSuchElementException
>>> JDK-8072582: Scanner delimits incorrectly when delimiter spans a buffer boundary
>>>
>>> issue: https://bugs.openjdk.java.net/browse/JDK-8139414
>>>        https://bugs.openjdk.java.net/browse/JDK-8072582
>>> webrev: http://cr.openjdk.java.net/~sherman/8072582_8139414/webrev
>>>
>>> In both cases the delimiter pattern is a kinda of "alternation" regex construct
>>> which can "match" the existing characters at the end of the internal buffer as
>>> delimiters, AND can extend to match more delimiters if more input is available.
>>>
>>> In issue JDK-8139414, the hasNext() uses hasTokenInBuffer() to find the
>>> delimiters
>>> "-;". It does not go beyond the boundary to check if there is more character,
>>> such
>>> as "-" that can also be part of the delimiters). So hasNext() returns true
>>> with the
>>> assumption that there is a token because there is/are more character after "-;".
>>> But method getCompleteTokenInBuffer() (used by next() implementation), which
>>> has the logic to check beyond the boundary even the delimiter pattern already
>>> has a match. It matches "-;-" as the delimiters and then find no "next" (null)
>>> after
>>> that.
>>>
>>> Similar for issue 8072582. This time the getCompleteTokenInBuffer does not
>>> use the "lookingAt() and beyond" logic for the second delimiters, which triggers
>>> problem when the delimiter pattern has different match result (beginning
>>> position)
>>> for cases within boundary and beyond boundary.
>>>
>>> The proposed fix here is to always check if there is more input when match
>>> delimiters at the internal buffer boundary.
>>>
>>> Thanks,
>>> Sherman
>