RFR for JDK-8022879 TEST_BUG: sun/nio/cs/MalformedSurrogates.java fails intermittently
Eric Wang
yiming.wang at oracle.com
Tue Dec 17 22:34:09 PST 2013
Hi Martin,
I have updated the test to add another 2 surrogates, solitary high and
solitary low, so 4 possibilities are covered.
The verbose code has been cleaned based on your suggestion, Can you
please review again?
http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
<http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.01/>
Thanks,
Eric
On 2013/12/18 3:03, Martin Buchholz wrote:
> Thanks, Eric.
>
> This looks good enough to check in. Optionally:
>
> Regarding my comment on what surrogate to test, I would tend to check
> 4 possibilities: solitary high surrogate, solitary low, high + low,
> low + high
>
>
> The code below is a bit more verbose than necessary. We can remove f
> by doing
>
> try {
> ...
> throw new ..("should throw ...");
> } catch (MalformedInputException | UnmappableCharacterException
> success) {}
>
> en.reset();
> 69 boolean f = false;
> 70 try {
> 71 en.encode(CharBuffer.wrap(surrogate));
> 72 } catch (MalformedInputException | UnmappableCharacterException ex) {
> 73 f = true;
> 74 } finally {
> 75 en.reset();
> 76 }
> 77 if (!f) {
> 78 throw new RuntimeException("testMalformedSurrogate failed with charset " +cs.name <http://cs.name>());
> 79 }
>
>
>
>
> On Tue, Dec 17, 2013 at 1:18 AM, Eric Wang <yiming.wang at oracle.com
> <mailto:yiming.wang at oracle.com>> wrote:
>
> Hi Martin,
>
> I have updated the test based on your comments and split the old
> logic into different functions to test legal or illegal surrogate
> pairs. Can you please help to review it?
> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/
> <http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.01/>
>
> For your last comment "reversed surrogate", I'm not sure whether
> my understanding is correct. Does that mean reverse legal
> surrogate e.g. \uD800\uDC00 to \uDC00\uD800 which is kind of
> malformed surrogate.
> I have covered this case in the test.
>
> Thanks,
> Eric
>
> On 2013/12/14 8:15, Martin Buchholz wrote:
>> I do like this approach. I think we can have some cleanup to
>> make it a little clearer what is being tested.
>>
>> Below, it makes no sense to check canEncode when you are creating
>> a decoder.
>> 60 try {
>> 61 de = charset.newDecoder();
>> 62 } catch (UnsupportedOperationException ex) {
>> 63 if (canEncode) {
>> 64 throw ex;
>> 65 }
>> 66 }
>>
>> *Once you do this:*
>> 68 if (!canEncode) {
>> 69 continue;
>> 70 }
>> the loop should no longer test canEncode, but does, obscuring the
>> logic.
>>
>> Use standard whitespace:
>> 109 }finally {
>> When encoding legal surrogate pair, should see
>> UnmappableCharacterException, not MalformedInputException.
>> Reverse is true with illegal surrogate pair.
>>
>> should probably tests isolated, in addition to reversed, surrogates.
>>
>> Compare my old test FindCanEncodeBugs.
>>
>>
>>
>> On Thu, Dec 12, 2013 at 4:51 AM, Eric Wang
>> <yiming.wang at oracle.com <mailto:yiming.wang at oracle.com>> wrote:
>>
>> Hi Martin & All,
>>
>> Please review the changes for MalformedSurrogates.
>> http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.00/
>> <http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.00/>
>>
>> The logic of changes are
>> 1. Iterate all charsets of system to check whether the
>> charset supports encode.
>> 2. If yes, check whether it supports surrogate.
>> 3. If yes, check whether it rejects malformed surrogate and
>> can en(de)code normal surrogate.
>>
>> Thanks,
>> Eric
>>
>> On 2013/11/8 0:23, Martin Buchholz wrote:
>>> I still like my old idea of iterating over all charsets and
>>> checking their reasonableness properties.
>>>
>>> Probably all charsets bundled with the jdk should reject
>>> unpaired surrogates when encoding. Check it! Failure to do
>>> so might be considered a security bug.
>>>
>>> I would use CodingErrorAction.REPORT instead of REPLACE and
>>> examine that the resulting exception occurs and has
>>> reasonable detail.
>>>
>>> For properly paired surrogates, check that the charset can
>>> encode the resulting codepoint using canEncode, and if so,
>>> check that encoding succeeds.
>>>
>>>
>>> On Thu, Nov 7, 2013 at 7:22 AM, Eric Wang
>>> <yiming.wang at oracle.com <mailto:yiming.wang at oracle.com>> wrote:
>>>
>>> Hi Everyone
>>>
>>> I am working on bug
>>> https://bugs.openjdk.java.net/browse/JDK-8022879.
>>> The test sun/nio/cs/MalformedSurrogates.java
>>> <http://hg.openjdk.java.net/jdk8/tl/jdk/file/44fa6bf42846/test/sun/nio/cs/MalformedSurrogates.java>
>>> doesn't run if the system default encoding is UTF-8. But
>>> unfortunately, UTF-8 is the default charset of most test
>>> machines, it means the test get few chances to be executed.
>>> Another defect is the test would failed if the default
>>> charset is UTF-16 or UTF-32 as the test doesn't take the
>>> 2 charsets into consideration.
>>>
>>> The idea of fix is no matter what system charset it is,
>>> the test should always be executed. Here thanks Martin's
>>> suggestion that instead of checking byte size, we can
>>> use CharsetEncoder.canEncode() and
>>> CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
>>> to check and replace malformed chars.
>>>
>>> So the test can be re-designed as below:
>>>
>>> 1. To use CharsetEncoder.canEncode() to check whether
>>> the string includes malformed characters.
>>> 2. If a string includes malformed characters e.g.
>>> "abc\uD800\uDB00efgh", then set
>>> CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
>>> to replace the malformed characters to the replacement
>>> "?" when calling CharsetEncoder.encode() method.
>>> 3. Verified by decoding the encoded ByteBuffer to
>>> CharBuffer, check whether it includes replacement "?"
>>> and compare it with old string, if not equal, then test
>>> passed.
>>> 4. If a sting doesn't include malformed characters e.g.
>>> "abc\uD800\uDC00efgh", the CharsetEncoder.encode()
>>> converts it to ByteBuffer which doesn't include
>>> replacement "?"
>>> 5. Verified by decoding the encoded ByteBuffer to
>>> CharBuffer, confirm that it doesn't include replacment
>>> "?" and compare it with old string, if equal, then test
>>> passed.
>>>
>>> Please let me know if you have any comments or suggestions.
>>>
>>> Thanks,
>>> Eric
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20131218/16780cb3/attachment-0001.html
More information about the nio-dev
mailing list