Java's strings, UTF-8 etc
Andrew Myers
andru at cs.cornell.edu
Fri Apr 3 14:55:29 UTC 2020
Afundamental problem with UTF-8 strings is the String API itself. String
relies on being able to use integer indices as iterators and then using
charAt() to select characters. A UTF-8 string should be a different
abstraction that does not encourage programmers to do inefficient random
access into strings, and also is able to return characters larger than
16 bits in size -- probably int/Integer rather than char/Character.
-- Andrew
Zenaan Harkness wrote:
>
> Hi Brian, in case it is of interest, I did an exploration over a week
> or two, and wrote up that journey here:
>
> https://zenaan.github.io/zen/javadoc/zen/lang/string.html
>
> See also of course:
>
> https://github.com/zenaan/zen
>
> And for reference, see also:
>
> https://mail.openjdk.java.net/pipermail/discuss/2016-November/004065.html
>
> https://mail.openjdk.java.net/pipermail/discuss/2016-November/004070.html
>
> https://mail.openjdk.java.net/pipermail/discuss/2016-November/004072.html
>
>
> (I was hoping to one day do a proof of concept for byte array backed
> UTF-8 strings in Java, but have not returned to this little project -
> in any case, I do believe this would be a fundamental improvement to
> the core of Java.)
>
> Best regards,
> Zenaan
More information about the discuss
mailing list