RFR for JDK-8022879 TEST_BUG: sun/nio/cs/MalformedSurrogates.java fails intermittently

Wed Dec 18 18:19:06 PST 2013

Hi Martin,

Thanks! Could you sponsor it to check in?

Eric
On 2013/12/19 0:47, Martin Buchholz wrote:
> The latest webrev looks great to me!
>
>
> On Tue, Dec 17, 2013 at 10:34 PM, Eric Wang <yiming.wang at oracle.com 
> <mailto:yiming.wang at oracle.com>> wrote:
>
>     Hi Martin,
>
>     I have updated the test to add another 2 surrogates, solitary high
>     and solitary low, so 4 possibilities are covered.
>     The verbose code has been cleaned based on your suggestion, Can
>     you please review again?
>     http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
>     <http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.01/>
>
>     Thanks,
>     Eric
>
>     On 2013/12/18 3:03, Martin Buchholz wrote:
>>     Thanks, Eric.
>>
>>     This looks good enough to check in.  Optionally:
>>
>>     Regarding my comment on what surrogate to test, I would tend to
>>     check 4 possibilities: solitary high surrogate, solitary low,
>>     high + low, low + high
>>
>>
>>     The code below is a bit more verbose than necessary.  We can
>>     remove f by doing
>>
>>     try {
>>       ...
>>       throw new ..("should throw ...");
>>     } catch (MalformedInputException | UnmappableCharacterException
>>     success) {}
>>
>>     en.reset();
>>        69         boolean f = false;
>>        70         try {
>>        71             en.encode(CharBuffer.wrap(surrogate));
>>        72         } catch (MalformedInputException | UnmappableCharacterException ex) {
>>        73             f = true;
>>        74         } finally {
>>        75             en.reset();
>>        76         }
>>        77         if (!f) {
>>        78             throw new RuntimeException("testMalformedSurrogate failed with charset " +cs.name  <http://cs.name>());
>>        79         }
>>
>>
>>
>>
>>     On Tue, Dec 17, 2013 at 1:18 AM, Eric Wang
>>     <yiming.wang at oracle.com <mailto:yiming.wang at oracle.com>> wrote:
>>
>>         Hi Martin,
>>
>>         I have updated the test based on your comments and split the
>>         old logic into different functions to test legal or illegal
>>         surrogate pairs. Can you please help to review it?
>>         http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
>>         <http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.01/>
>>
>>         For your last comment "reversed surrogate", I'm not sure
>>         whether my understanding is correct. Does that mean reverse
>>         legal surrogate e.g. \uD800\uDC00 to \uDC00\uD800 which is
>>         kind of malformed surrogate.
>>         I have covered this case in the test.
>>
>>         Thanks,
>>         Eric
>>
>>         On 2013/12/14 8:15, Martin Buchholz wrote:
>>>         I do like this approach.  I think we can have some cleanup
>>>         to make it a little clearer what is being tested.
>>>
>>>         Below, it makes no sense to check canEncode when you are
>>>         creating a decoder.
>>>            60             try {
>>>            61                 de = charset.newDecoder();
>>>            62             } catch (UnsupportedOperationException ex) {
>>>            63                 if (canEncode) {
>>>            64                     throw ex;
>>>            65                 }
>>>            66             }
>>>
>>>         *Once you do this:*
>>>            68             if (!canEncode) {
>>>            69                 continue;
>>>            70             }
>>>         the loop should no longer test canEncode, but does,
>>>         obscuring the logic.
>>>
>>>         Use standard whitespace:
>>>           109                 }finally {
>>>         When encoding legal surrogate pair, should see
>>>         UnmappableCharacterException, not MalformedInputException.
>>>         Reverse is true with illegal surrogate pair.
>>>
>>>         should probably tests isolated, in addition to reversed,
>>>         surrogates.
>>>
>>>         Compare my old test FindCanEncodeBugs.
>>>
>>>
>>>
>>>         On Thu, Dec 12, 2013 at 4:51 AM, Eric Wang
>>>         <yiming.wang at oracle.com <mailto:yiming.wang at oracle.com>> wrote:
>>>
>>>             Hi Martin & All,
>>>
>>>             Please review the changes for MalformedSurrogates.
>>>             http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.00/
>>>             <http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.00/>
>>>
>>>             The logic of changes are
>>>             1. Iterate all charsets of system to check whether the
>>>             charset supports encode.
>>>             2. If yes, check whether it supports surrogate.
>>>             3. If yes, check whether it rejects malformed surrogate
>>>             and can en(de)code normal surrogate.
>>>
>>>             Thanks,
>>>             Eric
>>>
>>>             On 2013/11/8 0:23, Martin Buchholz wrote:
>>>>             I still like my old idea of iterating over all charsets
>>>>             and checking their reasonableness properties.
>>>>
>>>>             Probably all charsets bundled with the jdk should
>>>>             reject unpaired surrogates when encoding.  Check it!
>>>>              Failure to do so might be considered a security bug.
>>>>
>>>>             I would use CodingErrorAction.REPORT  instead of
>>>>             REPLACE and examine that the resulting exception occurs
>>>>             and has reasonable detail.
>>>>
>>>>             For properly paired surrogates, check that the charset
>>>>             can encode the resulting codepoint using canEncode, and
>>>>             if so, check that encoding succeeds.
>>>>
>>>>
>>>>             On Thu, Nov 7, 2013 at 7:22 AM, Eric Wang
>>>>             <yiming.wang at oracle.com
>>>>             <mailto:yiming.wang at oracle.com>> wrote:
>>>>
>>>>                  Hi Everyone
>>>>
>>>>                 I am working on bug
>>>>                 https://bugs.openjdk.java.net/browse/JDK-8022879.
>>>>                 The test sun/nio/cs/MalformedSurrogates.java
>>>>                 <http://hg.openjdk.java.net/jdk8/tl/jdk/file/44fa6bf42846/test/sun/nio/cs/MalformedSurrogates.java>
>>>>                 doesn't run if the system default encoding is
>>>>                 UTF-8. But unfortunately, UTF-8 is the default
>>>>                 charset of most test machines, it means the test
>>>>                 get few chances to be executed.
>>>>                 Another defect is the test would failed if the
>>>>                 default charset is UTF-16 or UTF-32 as the test
>>>>                 doesn't take the 2 charsets into consideration.
>>>>
>>>>                 The idea of fix  is no matter what system charset
>>>>                 it is, the test should always be executed. Here
>>>>                 thanks Martin's suggestion that instead of checking
>>>>                 byte size, we can use CharsetEncoder.canEncode()
>>>>                 and
>>>>                 CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
>>>>                 to check and replace malformed chars.
>>>>
>>>>                 So the test can be re-designed as below:
>>>>
>>>>                 1. To use CharsetEncoder.canEncode() to check
>>>>                 whether the string includes malformed characters.
>>>>                 2. If a string includes malformed characters e.g.
>>>>                 "abc\uD800\uDB00efgh", then set
>>>>                 CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
>>>>                 to replace the malformed characters to the
>>>>                 replacement "?" when calling
>>>>                 CharsetEncoder.encode() method.
>>>>                 3. Verified by decoding the encoded ByteBuffer to
>>>>                 CharBuffer, check whether it includes replacement
>>>>                 "?" and compare it with old string, if not equal,
>>>>                 then test passed.
>>>>                 4. If a sting doesn't include malformed characters
>>>>                 e.g. "abc\uD800\uDC00efgh", the
>>>>                 CharsetEncoder.encode() converts it to ByteBuffer
>>>>                 which doesn't include replacement "?"
>>>>                 5. Verified by decoding the encoded ByteBuffer to
>>>>                 CharBuffer, confirm that it doesn't include
>>>>                 replacment "?" and compare it with old string, if
>>>>                 equal, then test passed.
>>>>
>>>>                 Please let me know if you have any comments or
>>>>                 suggestions.
>>>>
>>>>                 Thanks,
>>>>                 Eric
>>>>
>>>>
>>>
>>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20131219/a09b9ddb/attachment-0001.html