RFR for JDK-8022879 TEST_BUG: sun/nio/cs/MalformedSurrogates.java fails intermittently
Eric Wang
yiming.wang at oracle.com
Wed Dec 18 18:19:06 PST 2013
Hi Martin,
Thanks! Could you sponsor it to check in?
Eric
On 2013/12/19 0:47, Martin Buchholz wrote:
> The latest webrev looks great to me!
>
>
> On Tue, Dec 17, 2013 at 10:34 PM, Eric Wang <yiming.wang at oracle.com
> <mailto:yiming.wang at oracle.com>> wrote:
>
> Hi Martin,
>
> I have updated the test to add another 2 surrogates, solitary high
> and solitary low, so 4 possibilities are covered.
> The verbose code has been cleaned based on your suggestion, Can
> you please review again?
> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
> <http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.01/>
>
> Thanks,
> Eric
>
> On 2013/12/18 3:03, Martin Buchholz wrote:
>> Thanks, Eric.
>>
>> This looks good enough to check in. Optionally:
>>
>> Regarding my comment on what surrogate to test, I would tend to
>> check 4 possibilities: solitary high surrogate, solitary low,
>> high + low, low + high
>>
>>
>> The code below is a bit more verbose than necessary. We can
>> remove f by doing
>>
>> try {
>> ...
>> throw new ..("should throw ...");
>> } catch (MalformedInputException | UnmappableCharacterException
>> success) {}
>>
>> en.reset();
>> 69 boolean f = false;
>> 70 try {
>> 71 en.encode(CharBuffer.wrap(surrogate));
>> 72 } catch (MalformedInputException | UnmappableCharacterException ex) {
>> 73 f = true;
>> 74 } finally {
>> 75 en.reset();
>> 76 }
>> 77 if (!f) {
>> 78 throw new RuntimeException("testMalformedSurrogate failed with charset " +cs.name <http://cs.name>());
>> 79 }
>>
>>
>>
>>
>> On Tue, Dec 17, 2013 at 1:18 AM, Eric Wang
>> <yiming.wang at oracle.com <mailto:yiming.wang at oracle.com>> wrote:
>>
>> Hi Martin,
>>
>> I have updated the test based on your comments and split the
>> old logic into different functions to test legal or illegal
>> surrogate pairs. Can you please help to review it?
>> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
>> <http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.01/>
>>
>> For your last comment "reversed surrogate", I'm not sure
>> whether my understanding is correct. Does that mean reverse
>> legal surrogate e.g. \uD800\uDC00 to \uDC00\uD800 which is
>> kind of malformed surrogate.
>> I have covered this case in the test.
>>
>> Thanks,
>> Eric
>>
>> On 2013/12/14 8:15, Martin Buchholz wrote:
>>> I do like this approach. I think we can have some cleanup
>>> to make it a little clearer what is being tested.
>>>
>>> Below, it makes no sense to check canEncode when you are
>>> creating a decoder.
>>> 60 try {
>>> 61 de = charset.newDecoder();
>>> 62 } catch (UnsupportedOperationException ex) {
>>> 63 if (canEncode) {
>>> 64 throw ex;
>>> 65 }
>>> 66 }
>>>
>>> *Once you do this:*
>>> 68 if (!canEncode) {
>>> 69 continue;
>>> 70 }
>>> the loop should no longer test canEncode, but does,
>>> obscuring the logic.
>>>
>>> Use standard whitespace:
>>> 109 }finally {
>>> When encoding legal surrogate pair, should see
>>> UnmappableCharacterException, not MalformedInputException.
>>> Reverse is true with illegal surrogate pair.
>>>
>>> should probably tests isolated, in addition to reversed,
>>> surrogates.
>>>
>>> Compare my old test FindCanEncodeBugs.
>>>
>>>
>>>
>>> On Thu, Dec 12, 2013 at 4:51 AM, Eric Wang
>>> <yiming.wang at oracle.com <mailto:yiming.wang at oracle.com>> wrote:
>>>
>>> Hi Martin & All,
>>>
>>> Please review the changes for MalformedSurrogates.
>>> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.00/
>>> <http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.00/>
>>>
>>> The logic of changes are
>>> 1. Iterate all charsets of system to check whether the
>>> charset supports encode.
>>> 2. If yes, check whether it supports surrogate.
>>> 3. If yes, check whether it rejects malformed surrogate
>>> and can en(de)code normal surrogate.
>>>
>>> Thanks,
>>> Eric
>>>
>>> On 2013/11/8 0:23, Martin Buchholz wrote:
>>>> I still like my old idea of iterating over all charsets
>>>> and checking their reasonableness properties.
>>>>
>>>> Probably all charsets bundled with the jdk should
>>>> reject unpaired surrogates when encoding. Check it!
>>>> Failure to do so might be considered a security bug.
>>>>
>>>> I would use CodingErrorAction.REPORT instead of
>>>> REPLACE and examine that the resulting exception occurs
>>>> and has reasonable detail.
>>>>
>>>> For properly paired surrogates, check that the charset
>>>> can encode the resulting codepoint using canEncode, and
>>>> if so, check that encoding succeeds.
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 7:22 AM, Eric Wang
>>>> <yiming.wang at oracle.com
>>>> <mailto:yiming.wang at oracle.com>> wrote:
>>>>
>>>> Hi Everyone
>>>>
>>>> I am working on bug
>>>> https://bugs.openjdk.java.net/browse/JDK-8022879.
>>>> The test sun/nio/cs/MalformedSurrogates.java
>>>> <http://hg.openjdk.java.net/jdk8/tl/jdk/file/44fa6bf42846/test/sun/nio/cs/MalformedSurrogates.java>
>>>> doesn't run if the system default encoding is
>>>> UTF-8. But unfortunately, UTF-8 is the default
>>>> charset of most test machines, it means the test
>>>> get few chances to be executed.
>>>> Another defect is the test would failed if the
>>>> default charset is UTF-16 or UTF-32 as the test
>>>> doesn't take the 2 charsets into consideration.
>>>>
>>>> The idea of fix is no matter what system charset
>>>> it is, the test should always be executed. Here
>>>> thanks Martin's suggestion that instead of checking
>>>> byte size, we can use CharsetEncoder.canEncode()
>>>> and
>>>> CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
>>>> to check and replace malformed chars.
>>>>
>>>> So the test can be re-designed as below:
>>>>
>>>> 1. To use CharsetEncoder.canEncode() to check
>>>> whether the string includes malformed characters.
>>>> 2. If a string includes malformed characters e.g.
>>>> "abc\uD800\uDB00efgh", then set
>>>> CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
>>>> to replace the malformed characters to the
>>>> replacement "?" when calling
>>>> CharsetEncoder.encode() method.
>>>> 3. Verified by decoding the encoded ByteBuffer to
>>>> CharBuffer, check whether it includes replacement
>>>> "?" and compare it with old string, if not equal,
>>>> then test passed.
>>>> 4. If a sting doesn't include malformed characters
>>>> e.g. "abc\uD800\uDC00efgh", the
>>>> CharsetEncoder.encode() converts it to ByteBuffer
>>>> which doesn't include replacement "?"
>>>> 5. Verified by decoding the encoded ByteBuffer to
>>>> CharBuffer, confirm that it doesn't include
>>>> replacment "?" and compare it with old string, if
>>>> equal, then test passed.
>>>>
>>>> Please let me know if you have any comments or
>>>> suggestions.
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20131219/a09b9ddb/attachment-0001.html
More information about the nio-dev
mailing list