Code Review Request, JDK-8146600 AVA Normalizer.Form issue

Tue Sep 20 01:09:10 UTC 2016

> On Sep 20, 2016, at 8:58 AM, Xuelei Fan <xuelei.fan at oracle.com> wrote:
>> 
>> I this case, a comma appears but then it is escaped. You might say it is
>> unexpected, but at least after escaping, it becomes a legal string.
>> 
> I did not get the point.  A comma (",") should be escaped and it does get escaped and the string is legal.  Do you mean "，" (double bytes comma) should be converted to ","?  Can you have more details?

I'll write double bytes comma as ,, below.

Current code, "Hello,,world" is not modified at escaping, and becomes "Hello,world" after normalization. This is illegal.

With my fix, "Hello,,world" becomes "Hello,world" after normalization, and then "Hello\,world" after escaping. This is legal.

With your fix, "Hello,,world" is not modified after both steps, and it's legal.

So both your and my fixes will make it legal and the test will succeed.

> 
>>> It is something I want to avoid, so that it is fixed to use NFD
>>> instead.  I think if we are moving to use NFD, it is does not matter
>>> to escaping first or normalization first if I understand the UTF-8
>>> correctly.
>> 
>> Maybe, but IMO this is not the correct fix. The ultimate reason of the
>> bug is not the form chosen, but the order.
>> 
> I'm not with you for this bug. The bug is complain about the escaping issue, but actually the character should not be escaped.  So it is not an issue of escaping.  So this fix is not trying to fix the escaping issue, but trying to fix the normalization issue.

Yes it is complaining about escaping, but there are 2 ways to amend it. 1) escape it. 2) make it not necessary to escape.

I just prefer my fix, because I think that's where the bug is. Even if we switch to NFD, I would still like to put normalization before escaping, even if practically it makes no difference.

Thanks
Max

> 
> Thanks,
> Xuelei
> 
>> --Max
>> 
>>> 
>>> Thanks,
>>> Xuelei
>>> 
>>>> Thanks
>>>> Max
>>>> 
>>>>> On Sep 19, 2016, at 10:32 AM, Xuelei Fan <xuelei.fan at oracle.com
>>>>> <mailto:xuelei.fan at oracle.com>> wrote:
>>>>> 
>>>>>> 4. Is it possible to perform normalization before escaping special
>>>>>> characters?
>>>>>> 
>>>>> Yes.  I though about this case.  The current fix comes from the fact
>>>>> that UTF-8 "Hello, world!" and "Hello， world!" should be different.
>>>>> Parsing them as the same thing may result in unexpected serious issues.
>>>>