RFR for JDK-8022879 TEST_BUG: sun/nio/cs/MalformedSurrogates.java fails intermittently

Tue Dec 17 22:34:09 PST 2013

Hi Martin,

I have updated the test to add another 2 surrogates, solitary high and 
solitary low, so 4 possibilities are covered.
The verbose code has been cleaned based on your suggestion, Can you 
please review again?
http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/ 
<http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.01/>

Thanks,
Eric
On 2013/12/18 3:03, Martin Buchholz wrote:
> Thanks, Eric.
>
> This looks good enough to check in.  Optionally:
>
> Regarding my comment on what surrogate to test, I would tend to check 
> 4 possibilities: solitary high surrogate, solitary low, high + low, 
> low + high
>
>
> The code below is a bit more verbose than necessary.  We can remove f 
> by doing
>
> try {
>   ...
>   throw new ..("should throw ...");
> } catch (MalformedInputException | UnmappableCharacterException 
> success) {}
>
> en.reset();
>    69         boolean f = false;
>    70         try {
>    71             en.encode(CharBuffer.wrap(surrogate));
>    72         } catch (MalformedInputException | UnmappableCharacterException ex) {
>    73             f = true;
>    74         } finally {
>    75             en.reset();
>    76         }
>    77         if (!f) {
>    78             throw new RuntimeException("testMalformedSurrogate failed with charset " +cs.name  <http://cs.name>());
>    79         }
>
>
>
>
> On Tue, Dec 17, 2013 at 1:18 AM, Eric Wang <yiming.wang at oracle.com 
> <mailto:yiming.wang at oracle.com>> wrote:
>
>     Hi Martin,
>
>     I have updated the test based on your comments and split the old
>     logic into different functions to test legal or illegal surrogate
>     pairs. Can you please help to review it?
>     http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
>     <http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.01/>
>
>     For your last comment "reversed surrogate", I'm not sure whether
>     my understanding is correct. Does that mean reverse legal
>     surrogate e.g. \uD800\uDC00 to \uDC00\uD800 which is kind of
>     malformed surrogate.
>     I have covered this case in the test.
>
>     Thanks,
>     Eric
>
>     On 2013/12/14 8:15, Martin Buchholz wrote:
>>     I do like this approach.  I think we can have some cleanup to
>>     make it a little clearer what is being tested.
>>
>>     Below, it makes no sense to check canEncode when you are creating
>>     a decoder.
>>        60             try {
>>        61                 de = charset.newDecoder();
>>        62             } catch (UnsupportedOperationException ex) {
>>        63                 if (canEncode) {
>>        64                     throw ex;
>>        65                 }
>>        66             }
>>
>>     *Once you do this:*
>>        68             if (!canEncode) {
>>        69                 continue;
>>        70             }
>>     the loop should no longer test canEncode, but does, obscuring the
>>     logic.
>>
>>     Use standard whitespace:
>>       109                 }finally {
>>     When encoding legal surrogate pair, should see
>>     UnmappableCharacterException, not MalformedInputException.
>>     Reverse is true with illegal surrogate pair.
>>
>>     should probably tests isolated, in addition to reversed, surrogates.
>>
>>     Compare my old test FindCanEncodeBugs.
>>
>>
>>
>>     On Thu, Dec 12, 2013 at 4:51 AM, Eric Wang
>>     <yiming.wang at oracle.com <mailto:yiming.wang at oracle.com>> wrote:
>>
>>         Hi Martin & All,
>>
>>         Please review the changes for MalformedSurrogates.
>>         http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.00/
>>         <http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.00/>
>>
>>         The logic of changes are
>>         1. Iterate all charsets of system to check whether the
>>         charset supports encode.
>>         2. If yes, check whether it supports surrogate.
>>         3. If yes, check whether it rejects malformed surrogate and
>>         can en(de)code normal surrogate.
>>
>>         Thanks,
>>         Eric
>>
>>         On 2013/11/8 0:23, Martin Buchholz wrote:
>>>         I still like my old idea of iterating over all charsets and
>>>         checking their reasonableness properties.
>>>
>>>         Probably all charsets bundled with the jdk should reject
>>>         unpaired surrogates when encoding.  Check it!  Failure to do
>>>         so might be considered a security bug.
>>>
>>>         I would use CodingErrorAction.REPORT  instead of REPLACE and
>>>         examine that the resulting exception occurs and has
>>>         reasonable detail.
>>>
>>>         For properly paired surrogates, check that the charset can
>>>         encode the resulting codepoint using canEncode, and if so,
>>>         check that encoding succeeds.
>>>
>>>
>>>         On Thu, Nov 7, 2013 at 7:22 AM, Eric Wang
>>>         <yiming.wang at oracle.com <mailto:yiming.wang at oracle.com>> wrote:
>>>
>>>              Hi Everyone
>>>
>>>             I am working on bug
>>>             https://bugs.openjdk.java.net/browse/JDK-8022879.
>>>             The test sun/nio/cs/MalformedSurrogates.java
>>>             <http://hg.openjdk.java.net/jdk8/tl/jdk/file/44fa6bf42846/test/sun/nio/cs/MalformedSurrogates.java>
>>>             doesn't run if the system default encoding is UTF-8. But
>>>             unfortunately, UTF-8 is the default charset of most test
>>>             machines, it means the test get few chances to be executed.
>>>             Another defect is the test would failed if the default
>>>             charset is UTF-16 or UTF-32 as the test doesn't take the
>>>             2 charsets into consideration.
>>>
>>>             The idea of fix  is no matter what system charset it is,
>>>             the test should always be executed. Here thanks Martin's
>>>             suggestion that instead of checking byte size, we can
>>>             use CharsetEncoder.canEncode() and
>>>             CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
>>>             to check and replace malformed chars.
>>>
>>>             So the test can be re-designed as below:
>>>
>>>             1. To use CharsetEncoder.canEncode() to check whether
>>>             the string includes malformed characters.
>>>             2. If a string includes malformed characters e.g.
>>>             "abc\uD800\uDB00efgh", then set
>>>             CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
>>>             to replace the malformed characters to the replacement
>>>             "?" when calling CharsetEncoder.encode() method.
>>>             3. Verified by decoding the encoded ByteBuffer to
>>>             CharBuffer, check whether it includes replacement "?"
>>>             and compare it with old string, if not equal, then test
>>>             passed.
>>>             4. If a sting doesn't include malformed characters e.g.
>>>             "abc\uD800\uDC00efgh", the CharsetEncoder.encode()
>>>             converts it to ByteBuffer which doesn't include
>>>             replacement "?"
>>>             5. Verified by decoding the encoded ByteBuffer to
>>>             CharBuffer, confirm that it doesn't include replacment
>>>             "?" and compare it with old string, if equal, then test
>>>             passed.
>>>
>>>             Please let me know if you have any comments or suggestions.
>>>
>>>             Thanks,
>>>             Eric
>>>
>>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20131218/16780cb3/attachment-0001.html