RFR for JDK-8022879 TEST_BUG: sun/nio/cs/MalformedSurrogates.java fails intermittently

Martin Buchholz martinrb at google.com
Wed Dec 18 08:47:05 PST 2013


The latest webrev looks great to me!


On Tue, Dec 17, 2013 at 10:34 PM, Eric Wang <yiming.wang at oracle.com> wrote:

>  Hi Martin,
>
> I have updated the test to add another 2 surrogates, solitary high and
> solitary low, so 4 possibilities are covered.
> The verbose code has been cleaned based on your suggestion, Can you please
> review again?
> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
>
> Thanks,
> Eric
>
> On 2013/12/18 3:03, Martin Buchholz wrote:
>
> Thanks, Eric.
>
>  This looks good enough to check in.  Optionally:
>
>  Regarding my comment on what surrogate to test, I would tend to check 4
> possibilities: solitary high surrogate, solitary low, high + low, low + high
>
>
>  The code below is a bit more verbose than necessary.  We can remove f by
> doing
>
>  try {
>   ...
>   throw new ..("should throw ...");
> } catch (MalformedInputException | UnmappableCharacterException success) {
> }
>
>  en.reset();
>
>   69         boolean f = false;  70         try {  71             en.encode(CharBuffer.wrap(surrogate));  72         } catch (MalformedInputException | UnmappableCharacterException ex) {  73             f = true;  74         } finally {  75             en.reset();  76         }  77         if (!f) {  78             throw new RuntimeException("testMalformedSurrogate failed with charset " + cs.name());  79         }
>
>
>
>
>
> On Tue, Dec 17, 2013 at 1:18 AM, Eric Wang <yiming.wang at oracle.com> wrote:
>
>>  Hi Martin,
>>
>> I have updated the test based on your comments and split the old logic
>> into different functions to test legal or illegal surrogate pairs. Can you
>> please help to review it?
>> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
>>
>> For your last comment "reversed surrogate", I'm not sure whether my
>> understanding is correct. Does that mean reverse legal surrogate e.g.
>> \uD800\uDC00 to \uDC00\uD800 which is kind of malformed surrogate.
>> I have covered this case in the test.
>>
>> Thanks,
>> Eric
>>
>> On 2013/12/14 8:15, Martin Buchholz wrote:
>>
>> I do like this approach.  I think we can have some cleanup to make it a
>> little clearer what is being tested.
>>
>>  Below, it makes no sense to check canEncode when you are creating a
>> decoder.
>>
>>   60             try {  61                 de = charset.newDecoder();  62             } catch (UnsupportedOperationException ex) {  63                 if (canEncode) {  64                     throw ex;  65                 }  66             }
>>
>>
>>  *Once you do this:*
>>
>>   68             if (!canEncode) {  69                 continue;  70             }
>>
>> the loop should no longer test canEncode, but does, obscuring the logic.
>>
>>  Use standard whitespace:
>>
>>  109                 }finally {
>>
>> When encoding legal surrogate pair, should see UnmappableCharacterException,
>> not MalformedInputException.
>> Reverse is true with illegal surrogate pair.
>>
>>  should probably tests isolated, in addition to reversed, surrogates.
>>
>>  Compare my old test FindCanEncodeBugs.
>>
>>
>>
>> On Thu, Dec 12, 2013 at 4:51 AM, Eric Wang <yiming.wang at oracle.com>wrote:
>>
>>>  Hi Martin & All,
>>>
>>> Please review the changes for MalformedSurrogates.
>>> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.00/
>>>
>>> The logic of changes are
>>> 1. Iterate all charsets of system to check whether the charset supports
>>> encode.
>>> 2. If yes, check whether it supports surrogate.
>>> 3. If yes, check whether it rejects malformed surrogate and can
>>> en(de)code normal surrogate.
>>>
>>> Thanks,
>>> Eric
>>>
>>> On 2013/11/8 0:23, Martin Buchholz wrote:
>>>
>>> I still like my old idea of iterating over all charsets and checking
>>> their reasonableness properties.
>>>
>>>  Probably all charsets bundled with the jdk should reject unpaired
>>> surrogates when encoding.  Check it!  Failure to do so might be considered
>>> a security bug.
>>>
>>>  I would use CodingErrorAction.REPORT  instead of REPLACE and examine
>>> that the resulting exception occurs and has reasonable detail.
>>>
>>>  For properly paired surrogates, check that the charset can encode the
>>> resulting codepoint using canEncode, and if so, check that encoding
>>> succeeds.
>>>
>>>
>>> On Thu, Nov 7, 2013 at 7:22 AM, Eric Wang <yiming.wang at oracle.com>wrote:
>>>
>>>>    Hi Everyone
>>>>
>>>> I am working on bug https://bugs.openjdk.java.net/browse/JDK-8022879.
>>>> The test sun/nio/cs/MalformedSurrogates.java<http://hg.openjdk.java.net/jdk8/tl/jdk/file/44fa6bf42846/test/sun/nio/cs/MalformedSurrogates.java>doesn't run if the system default encoding is UTF-8. But unfortunately,
>>>> UTF-8 is the default charset of most test machines, it means the test get
>>>> few chances to be executed.
>>>> Another defect is the test would failed if the default charset is
>>>> UTF-16 or UTF-32 as the test doesn't take the 2 charsets into consideration.
>>>>
>>>> The idea of fix  is no matter what system charset it is, the test
>>>> should always be executed. Here thanks Martin's suggestion that instead of
>>>> checking byte size, we can use CharsetEncoder.canEncode() and CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
>>>> to check and replace malformed chars.
>>>>
>>>> So the test can be re-designed as below:
>>>>
>>>> 1. To use CharsetEncoder.canEncode() to check whether the string
>>>> includes malformed characters.
>>>> 2. If a string includes malformed characters e.g.
>>>> "abc\uD800\uDB00efgh", then set
>>>> CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE) to replace the
>>>> malformed characters to the replacement "?" when calling
>>>> CharsetEncoder.encode() method.
>>>> 3. Verified by decoding the encoded ByteBuffer to CharBuffer, check
>>>> whether it includes replacement "?" and compare it with old string, if not
>>>> equal, then test passed.
>>>> 4. If a sting doesn't include malformed characters e.g.
>>>> "abc\uD800\uDC00efgh", the CharsetEncoder.encode() converts it to
>>>> ByteBuffer which doesn't include replacement "?"
>>>> 5. Verified by decoding the encoded ByteBuffer to CharBuffer, confirm
>>>> that it doesn't include replacment "?" and compare it with old string, if
>>>> equal, then test passed.
>>>>
>>>> Please let me know if you have any comments or suggestions.
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20131218/ad2aaca9/attachment-0001.html 


More information about the nio-dev mailing list