RFR: 8197594 - String and character repeat

Martin Buchholz martinrb at google.com
Mon Feb 19 15:55:36 UTC 2018


.



On Sun, Feb 18, 2018 at 11:19 AM, Martin Buchholz <martinrb at google.com>
wrote:

>
>  - how many digits to consume after the escape?  How much do we trust
> Unicode to never ever grow beyond 5 hex digits?
>

Oops, I already got it wrong - it's already at 6 hex digits because there
are 17 planes, not 16.  MAX_CODE_POINT is U+10FFFF.
Yes, we need a variable width syntax like regex \x{h...h}

And java regex also supports
  \N{name} The character with Unicode character name 'name'
so we could do the same for the java language.
Although it would be a little weird to have every Unicode update make some
previously invalid source files valid.

We could also say "It's 2018 and UTF-8 has won" and simply use UTF-8 in
source files directly.   No Unicode escapes needed.


More information about the core-libs-dev mailing list