RFR for JDK-8022879 TEST_BUG: sun/nio/cs/MalformedSurrogates.java fails intermittently

Thu Dec 19 09:02:12 PST 2013

I haven't really been a "sponsor" - that's traditionally been done only by
Sun/Oracle folks.  I'd be more comfortable having a nio owner sponsor it.
But I looked at your change with "sponsor hat" on.
The patch file should be in "hg export" format so that a sponsor can hg
import it.  If the webrev script can't generate such patches, it should be
fixed.
You need to tell us which jdk release you're targeting.  Where do you
expect this change to land?  Presumably jdk9?

On Wed, Dec 18, 2013 at 6:19 PM, Eric Wang <yiming.wang at oracle.com> wrote:

>  Hi Martin,
>
> Thanks! Could you sponsor it to check in?
>
> Eric
>
> On 2013/12/19 0:47, Martin Buchholz wrote:
>
> The latest webrev looks great to me!
>
>
> On Tue, Dec 17, 2013 at 10:34 PM, Eric Wang <yiming.wang at oracle.com>wrote:
>
>>  Hi Martin,
>>
>> I have updated the test to add another 2 surrogates, solitary high and
>> solitary low, so 4 possibilities are covered.
>> The verbose code has been cleaned based on your suggestion, Can you
>> please review again?
>> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
>>
>> Thanks,
>> Eric
>>
>> On 2013/12/18 3:03, Martin Buchholz wrote:
>>
>> Thanks, Eric.
>>
>>  This looks good enough to check in.  Optionally:
>>
>>  Regarding my comment on what surrogate to test, I would tend to check 4
>> possibilities: solitary high surrogate, solitary low, high + low, low + high
>>
>>
>>  The code below is a bit more verbose than necessary.  We can remove f
>> by doing
>>
>>  try {
>>   ...
>>   throw new ..("should throw ...");
>> } catch (MalformedInputException | UnmappableCharacterException success)
>> {}
>>
>>  en.reset();
>>
>>   69         boolean f = false;  70         try {  71             en.encode(CharBuffer.wrap(surrogate));  72         } catch (MalformedInputException | UnmappableCharacterException ex) {  73             f = true;  74         } finally {  75             en.reset();  76         }  77         if (!f) {  78             throw new RuntimeException("testMalformedSurrogate failed with charset " + cs.name());  79         }
>>
>>
>>
>>
>>
>> On Tue, Dec 17, 2013 at 1:18 AM, Eric Wang <yiming.wang at oracle.com>wrote:
>>
>>>  Hi Martin,
>>>
>>> I have updated the test based on your comments and split the old logic
>>> into different functions to test legal or illegal surrogate pairs. Can you
>>> please help to review it?
>>> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
>>>
>>> For your last comment "reversed surrogate", I'm not sure whether my
>>> understanding is correct. Does that mean reverse legal surrogate e.g.
>>> \uD800\uDC00 to \uDC00\uD800 which is kind of malformed surrogate.
>>> I have covered this case in the test.
>>>
>>> Thanks,
>>> Eric
>>>
>>> On 2013/12/14 8:15, Martin Buchholz wrote:
>>>
>>> I do like this approach.  I think we can have some cleanup to make it a
>>> little clearer what is being tested.
>>>
>>>  Below, it makes no sense to check canEncode when you are creating a
>>> decoder.
>>>
>>>   60             try {  61                 de = charset.newDecoder();  62             } catch (UnsupportedOperationException ex) {  63                 if (canEncode) {  64                     throw ex;  65                 }  66             }
>>>
>>>
>>>  *Once you do this:*
>>>
>>>   68             if (!canEncode) {  69                 continue;  70             }
>>>
>>> the loop should no longer test canEncode, but does, obscuring the logic.
>>>
>>>  Use standard whitespace:
>>>
>>>  109                 }finally {
>>>
>>> When encoding legal surrogate pair, should see UnmappableCharacterException,
>>> not MalformedInputException.
>>> Reverse is true with illegal surrogate pair.
>>>
>>>  should probably tests isolated, in addition to reversed, surrogates.
>>>
>>>  Compare my old test FindCanEncodeBugs.
>>>
>>>
>>>
>>> On Thu, Dec 12, 2013 at 4:51 AM, Eric Wang <yiming.wang at oracle.com>wrote:
>>>
>>>>  Hi Martin & All,
>>>>
>>>> Please review the changes for MalformedSurrogates.
>>>> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.00/
>>>>
>>>> The logic of changes are
>>>> 1. Iterate all charsets of system to check whether the charset supports
>>>> encode.
>>>> 2. If yes, check whether it supports surrogate.
>>>> 3. If yes, check whether it rejects malformed surrogate and can
>>>> en(de)code normal surrogate.
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>> On 2013/11/8 0:23, Martin Buchholz wrote:
>>>>
>>>> I still like my old idea of iterating over all charsets and checking
>>>> their reasonableness properties.
>>>>
>>>>  Probably all charsets bundled with the jdk should reject unpaired
>>>> surrogates when encoding.  Check it!  Failure to do so might be considered
>>>> a security bug.
>>>>
>>>>  I would use CodingErrorAction.REPORT  instead of REPLACE and examine
>>>> that the resulting exception occurs and has reasonable detail.
>>>>
>>>>  For properly paired surrogates, check that the charset can encode the
>>>> resulting codepoint using canEncode, and if so, check that encoding
>>>> succeeds.
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 7:22 AM, Eric Wang <yiming.wang at oracle.com>wrote:
>>>>
>>>>>    Hi Everyone
>>>>>
>>>>> I am working on bug https://bugs.openjdk.java.net/browse/JDK-8022879.
>>>>>
>>>>> The test sun/nio/cs/MalformedSurrogates.java<http://hg.openjdk.java.net/jdk8/tl/jdk/file/44fa6bf42846/test/sun/nio/cs/MalformedSurrogates.java>doesn't run if the system default encoding is UTF-8. But unfortunately,
>>>>> UTF-8 is the default charset of most test machines, it means the test get
>>>>> few chances to be executed.
>>>>> Another defect is the test would failed if the default charset is
>>>>> UTF-16 or UTF-32 as the test doesn't take the 2 charsets into consideration.
>>>>>
>>>>> The idea of fix  is no matter what system charset it is, the test
>>>>> should always be executed. Here thanks Martin's suggestion that instead of
>>>>> checking byte size, we can use CharsetEncoder.canEncode() and CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
>>>>> to check and replace malformed chars.
>>>>>
>>>>> So the test can be re-designed as below:
>>>>>
>>>>> 1. To use CharsetEncoder.canEncode() to check whether the string
>>>>> includes malformed characters.
>>>>> 2. If a string includes malformed characters e.g.
>>>>> "abc\uD800\uDB00efgh", then set
>>>>> CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE) to replace the
>>>>> malformed characters to the replacement "?" when calling
>>>>> CharsetEncoder.encode() method.
>>>>> 3. Verified by decoding the encoded ByteBuffer to CharBuffer, check
>>>>> whether it includes replacement "?" and compare it with old string, if not
>>>>> equal, then test passed.
>>>>> 4. If a sting doesn't include malformed characters e.g.
>>>>> "abc\uD800\uDC00efgh", the CharsetEncoder.encode() converts it to
>>>>> ByteBuffer which doesn't include replacement "?"
>>>>> 5. Verified by decoding the encoded ByteBuffer to CharBuffer, confirm
>>>>> that it doesn't include replacment "?" and compare it with old string, if
>>>>> equal, then test passed.
>>>>>
>>>>> Please let me know if you have any comments or suggestions.
>>>>>
>>>>> Thanks,
>>>>> Eric
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20131219/98eea1e9/attachment-0001.html