RFR JDK-8139414: java.util.Scanner hasNext() returns true, next() throws NoSuchElementException

Tue Jun 14 22:34:43 UTC 2016

Thanks Stuart!

webrev has been updated accordingly based on your suggestion.

http://cr.openjdk.java.net/~sherman/8072582_8139414/webrev

-Sherman

On 6/14/16, 1:22 PM, Stuart Marks wrote:
> Hi Sherman,
>
> The fix looks good.
>
> It would be helpful if the test for 8072582 generated the string 
> instead of using a literal that's more than 1K long. The exact length 
> is significant because Scanner's default buffer size is 1024, so the 
> delimiter has to straddle the buffer boundary.
>
> The 8139414 test generates its string, which is nicer. In this case 
> the test is taken from the bug report, but in my opinion the addition 
> of the "boundary" variable (which is the string ";") makes things more 
> obscure. I'd suggest inlining it.
>
> For both test cases it might be helpful to have a little utility that 
> appends n copies of a char to a StringBuilder.
>
> Thanks,
>
> s'marks
>
> On 6/8/16 1:57 PM, Xueming Shen wrote:
>> Hi,
>>
>> Please help review the change for
>>
>> JDK-8139414: java.util.Scanner hasNext() returns true, next() throws
>> NoSuchElementException
>> JDK-8072582: Scanner delimits incorrectly when delimiter spans a 
>> buffer boundary
>>
>> issue: https://bugs.openjdk.java.net/browse/JDK-8139414
>>        https://bugs.openjdk.java.net/browse/JDK-8072582
>> webrev: http://cr.openjdk.java.net/~sherman/8072582_8139414/webrev
>>
>> In both cases the delimiter pattern is a kinda of "alternation" regex 
>> construct
>> which can "match" the existing characters at the end of the internal 
>> buffer as
>> delimiters, AND can extend to match more delimiters if more input is 
>> available.
>>
>> In issue JDK-8139414, the hasNext() uses hasTokenInBuffer() to find 
>> the delimiters
>> "-;". It does not go beyond the boundary to check if there is more 
>> character, such
>> as "-" that can also be part of the delimiters). So hasNext() returns 
>> true with the
>> assumption that there is a token because there is/are more character 
>> after "-;".
>> But method getCompleteTokenInBuffer() (used by next() 
>> implementation), which
>> has the logic to check beyond the boundary even the delimiter pattern 
>> already
>> has a match. It matches "-;-" as the delimiters and then find no 
>> "next" (null)
>> after
>> that.
>>
>> Similar for issue 8072582. This time the getCompleteTokenInBuffer 
>> does not
>> use the "lookingAt() and beyond" logic for the second delimiters, 
>> which triggers
>> problem when the delimiter pattern has different match result 
>> (beginning position)
>> for cases within boundary and beyond boundary.
>>
>> The proposed fix here is to always check if there is more input when 
>> match
>> delimiters at the internal buffer boundary.
>>
>> Thanks,
>> Sherman