RFR for JDK-8022879 TEST_BUG: sun/nio/cs/MalformedSurrogates.java fails intermittently

Tue Dec 17 11:03:17 PST 2013

Thanks, Eric.

This looks good enough to check in.  Optionally:

Regarding my comment on what surrogate to test, I would tend to check 4
possibilities: solitary high surrogate, solitary low, high + low, low + high

The code below is a bit more verbose than necessary.  We can remove f by
doing

try {
  ...
  throw new ..("should throw ...");
} catch (MalformedInputException | UnmappableCharacterException success) {}

en.reset();

  69         boolean f = false;  70         try {  71
en.encode(CharBuffer.wrap(surrogate));  72         } catch
(MalformedInputException | UnmappableCharacterException ex) {  73
       f = true;  74         } finally {  75             en.reset();
76         }  77         if (!f) {  78             throw new
RuntimeException("testMalformedSurrogate failed with charset " +
cs.name());  79         }

On Tue, Dec 17, 2013 at 1:18 AM, Eric Wang <yiming.wang at oracle.com> wrote:

>  Hi Martin,
>
> I have updated the test based on your comments and split the old logic
> into different functions to test legal or illegal surrogate pairs. Can you
> please help to review it?
> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
>
> For your last comment "reversed surrogate", I'm not sure whether my
> understanding is correct. Does that mean reverse legal surrogate e.g.
> \uD800\uDC00 to \uDC00\uD800 which is kind of malformed surrogate.
> I have covered this case in the test.
>
> Thanks,
> Eric
>
> On 2013/12/14 8:15, Martin Buchholz wrote:
>
> I do like this approach.  I think we can have some cleanup to make it a
> little clearer what is being tested.
>
>  Below, it makes no sense to check canEncode when you are creating a
> decoder.
>
>   60             try {  61                 de = charset.newDecoder();  62             } catch (UnsupportedOperationException ex) {  63                 if (canEncode) {  64                     throw ex;  65                 }  66             }
>
>
>  *Once you do this:*
>
>   68             if (!canEncode) {  69                 continue;  70             }
>
> the loop should no longer test canEncode, but does, obscuring the logic.
>
>  Use standard whitespace:
>
>  109                 }finally {
>
> When encoding legal surrogate pair, should see UnmappableCharacterException,
> not MalformedInputException.
> Reverse is true with illegal surrogate pair.
>
>  should probably tests isolated, in addition to reversed, surrogates.
>
>  Compare my old test FindCanEncodeBugs.
>
>
>
> On Thu, Dec 12, 2013 at 4:51 AM, Eric Wang <yiming.wang at oracle.com> wrote:
>
>>  Hi Martin & All,
>>
>> Please review the changes for MalformedSurrogates.
>> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.00/
>>
>> The logic of changes are
>> 1. Iterate all charsets of system to check whether the charset supports
>> encode.
>> 2. If yes, check whether it supports surrogate.
>> 3. If yes, check whether it rejects malformed surrogate and can
>> en(de)code normal surrogate.
>>
>> Thanks,
>> Eric
>>
>> On 2013/11/8 0:23, Martin Buchholz wrote:
>>
>> I still like my old idea of iterating over all charsets and checking
>> their reasonableness properties.
>>
>>  Probably all charsets bundled with the jdk should reject unpaired
>> surrogates when encoding.  Check it!  Failure to do so might be considered
>> a security bug.
>>
>>  I would use CodingErrorAction.REPORT  instead of REPLACE and examine
>> that the resulting exception occurs and has reasonable detail.
>>
>>  For properly paired surrogates, check that the charset can encode the
>> resulting codepoint using canEncode, and if so, check that encoding
>> succeeds.
>>
>>
>> On Thu, Nov 7, 2013 at 7:22 AM, Eric Wang <yiming.wang at oracle.com> wrote:
>>
>>>    Hi Everyone
>>>
>>> I am working on bug https://bugs.openjdk.java.net/browse/JDK-8022879.
>>> The test sun/nio/cs/MalformedSurrogates.java<http://hg.openjdk.java.net/jdk8/tl/jdk/file/44fa6bf42846/test/sun/nio/cs/MalformedSurrogates.java>doesn't run if the system default encoding is UTF-8. But unfortunately,
>>> UTF-8 is the default charset of most test machines, it means the test get
>>> few chances to be executed.
>>> Another defect is the test would failed if the default charset is UTF-16
>>> or UTF-32 as the test doesn't take the 2 charsets into consideration.
>>>
>>> The idea of fix  is no matter what system charset it is, the test should
>>> always be executed. Here thanks Martin's suggestion that instead of
>>> checking byte size, we can use CharsetEncoder.canEncode() and CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
>>> to check and replace malformed chars.
>>>
>>> So the test can be re-designed as below:
>>>
>>> 1. To use CharsetEncoder.canEncode() to check whether the string
>>> includes malformed characters.
>>> 2. If a string includes malformed characters e.g. "abc\uD800\uDB00efgh",
>>> then set CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE) to
>>> replace the malformed characters to the replacement "?" when calling
>>> CharsetEncoder.encode() method.
>>> 3. Verified by decoding the encoded ByteBuffer to CharBuffer, check
>>> whether it includes replacement "?" and compare it with old string, if not
>>> equal, then test passed.
>>> 4. If a sting doesn't include malformed characters e.g.
>>> "abc\uD800\uDC00efgh", the CharsetEncoder.encode() converts it to
>>> ByteBuffer which doesn't include replacement "?"
>>> 5. Verified by decoding the encoded ByteBuffer to CharBuffer, confirm
>>> that it doesn't include replacment "?" and compare it with old string, if
>>> equal, then test passed.
>>>
>>> Please let me know if you have any comments or suggestions.
>>>
>>> Thanks,
>>> Eric
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20131217/5f5605f4/attachment-0001.html