<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div></div><div style="direction: inherit;">Sorry. Whenever I wrote NFC, I meant NFD. Typo. </div><div><br>在 2016年9月19日,23:16,Xuelei Fan <<a href="mailto:xuelei.fan@oracle.com">xuelei.fan@oracle.com</a>> 写道:<br><br></div><blockquote type="cite"><div><span>On 9/19/2016 11:03 PM, Wang Weijun wrote:</span><br><blockquote type="cite"><span>After some thinking, my current opinion is.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>1. Maybe NFC is better than NFKD, but I am not a Unicode expert.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><span>It is updated from NFKD to NFD. I did not get the point. Do you mean NFC is better than NFD?</span><br><span></span><br><blockquote type="cite"><span>2. I think the real bug is the order of escaping and normalization. The normalization (if a must) should be performed earlier right after valStr is created and only performed on valStr. Otherwise the NFKD normalization would generate new chars that need to be escaped. Again I am not a Unicode expert and I don't know if NFC will also do the same.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><span>I don't get the point. The update is moving from NFKD to NFD. No NFKD normalization any more.</span><br><span></span><br><blockquote type="cite"><span>If 2) is fixed, whatever is correct in 1) does not matter much.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><span>If we continue to use NFKD, normalization before escaping would result in unexpected string as we talked for the hello-world example. </span></div></blockquote><div style="direction: inherit;"><br></div><div style="direction: inherit;">I this case, a comma appears but then it is escaped. You might say it is unexpected, but at least after escaping, it becomes a legal string. </div><div style="direction: inherit;"><br></div><blockquote type="cite"><div><span>It is something I want to avoid, so that it is fixed to use NFD instead. I think if we are moving to use NFD, it is does not matter to escaping first or normalization first if I understand the UTF-8 correctly.</span><br></div></blockquote><div style="direction: inherit;"><br></div><div style="direction: inherit;">Maybe, but IMO this is not the correct fix. The ultimate reason of the bug is not the form chosen, but the order. </div><div style="direction: inherit;"><br></div><div style="direction: inherit;">--Max</div><div style="direction: inherit;"><br></div><blockquote type="cite"><div><span></span><br><span>Thanks,</span><br><span>Xuelei</span><br><span></span><br><blockquote type="cite"><span>Thanks</span><br></blockquote><blockquote type="cite"><span>Max</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><blockquote type="cite"><span>On Sep 19, 2016, at 10:32 AM, Xuelei Fan <<a href="mailto:xuelei.fan@oracle.com">xuelei.fan@oracle.com</a>> wrote:</span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span>4. Is it possible to perform normalization before escaping special characters?</span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><span></span><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><span>Yes. I though about this case. The current fix comes from the fact that UTF-8 "Hello, world!" and "Hello, world!" should be different. Parsing them as the same thing may result in unexpected serious issues.</span><br></blockquote></blockquote><blockquote type="cite"><span></span><br></blockquote></div></blockquote></body></html>