Code Review Request, JDK-8146600 AVA Normalizer.Form issue

Mon Sep 19 02:32:49 UTC 2016

On 9/19/2016 9:46 AM, Wang Weijun wrote:
> I am not sure of this change for several reasons:
>
> 1. I cannot find anywhere in RFC 2253 (or its new versions) mentioning normalizations. Do you know elsewhere?
>
Normalization is not a part of RFC 2253.  The spec is described in 
Unicode standards.
    http://www.unicode.org/reports/tr15/tr15-23.html

> 2. It's not obvious to say "Hello, world!" and "Hello， world!" should be different if NFKD thinks they are.
>
ASN.1 and RFC 2253 require UTF-8 encoding.  "Hello, world!" is encoded 
as "Hello%2C%20world%21".  "Hello， world!" is encoded as 
"Hello%EF%BC%8C%20world%21".   The encoded code should be different. 
When signing a certificate, "，" is not converted to ",",  I don't think 
it is fine to convert it while parsing the field.

> 3. Why not NFC? Although I did't find normalization on X500 names in RFC 5280, I do see in several other cases NFV is used.
>
Actually, I'm not sure why normalization is required here.  So I don't 
want to update the code too much.  The previous form is NFKD.  If 
removing the "compatibility" impact part, the form is NFD, then.

What's the form of NFV?  Any typo?

> 4. Is it possible to perform normalization before escaping special characters?
>
Yes.  I though about this case.  The current fix comes from the fact 
that UTF-8 "Hello, world!" and "Hello， world!" should be different. 
Parsing them as the same thing may result in unexpected serious issues.

> 5. Why is normalization necessary? At least in RFC 5280 4.1.2.6, it says
>
>            When the subject of the certificate is a CA, the subject
>            field MUST be encoded in the same way as it is encoded in the
>            issuer field (Section 4.1.2.4 ) in all certificates issued by
>            the subject CA.
>
> which implies comparison should be on encoding instead of toString.
>
I have to say I agree with this point.  I don't see the point to use 
normalization.  But I'm not sure I get the full information to remove 
the normalization.  I don't want to fix it until it is broken.

Xuelei

> Thanks
> Max
>
>> On Sep 15, 2016, at 8:09 AM, Xuelei Fan <xuelei.fan at oracle.com> wrote:
>>
>> Hi,
>>
>> Please review this fix:
>>    http://cr.openjdk.java.net/~xuelei/8146600/webrev.00/
>>
>> The Normalizer.Form.NFKD is used to normalize attribute-value assertion in X.509 certificate processing.  The normalizer may convert some UTF-8 character into ASCII code.  For example, "，"(two bytes) will be converted to ","(one byte), and "Hello， world!" is normalize to "Hello, world!".  However, "Hello, world!" and "Hello， world!" should be different because of the comma code.  This conversion may result in unexpected weird behaviors for name comparing and conversions.
>>
>> This fix will update to use "Normalizer.Form.NFD".
>>
>> Thanks,
>> Xuelei
>