RFR: 8197594 - String and character repeat
Martin Buchholz
martinrb at google.com
Mon Feb 19 15:55:36 UTC 2018
.
On Sun, Feb 18, 2018 at 11:19 AM, Martin Buchholz <martinrb at google.com>
wrote:
>
> - how many digits to consume after the escape? How much do we trust
> Unicode to never ever grow beyond 5 hex digits?
>
Oops, I already got it wrong - it's already at 6 hex digits because there
are 17 planes, not 16. MAX_CODE_POINT is U+10FFFF.
Yes, we need a variable width syntax like regex \x{h...h}
And java regex also supports
\N{name} The character with Unicode character name 'name'
so we could do the same for the java language.
Although it would be a little weird to have every Unicode update make some
previously invalid source files valid.
We could also say "It's 2018 and UTF-8 has won" and simply use UTF-8 in
source files directly. No Unicode escapes needed.
More information about the core-libs-dev
mailing list