Java's strings, UTF-8 etc

Andrew Myers andru at cs.cornell.edu
Fri Apr 3 14:55:29 UTC 2020


Afundamental problem with UTF-8 strings is the String API itself. String 
relies on being able to use integer indices as iterators and then using 
charAt() to select characters. A UTF-8 string should be a different 
abstraction that does not encourage programmers to do inefficient random 
access into strings, and also is able to return characters larger than 
16 bits in size -- probably int/Integer rather than char/Character.

-- Andrew

Zenaan Harkness wrote:
>
> Hi Brian, in case it is of interest, I did an exploration over a week 
> or two, and wrote up that journey here:
>
> https://zenaan.github.io/zen/javadoc/zen/lang/string.html
>
> See also of course:
>
> https://github.com/zenaan/zen
>
> And for reference, see also:
>
> https://mail.openjdk.java.net/pipermail/discuss/2016-November/004065.html
>
> https://mail.openjdk.java.net/pipermail/discuss/2016-November/004070.html
>
> https://mail.openjdk.java.net/pipermail/discuss/2016-November/004072.html
>
>
> (I was hoping to one day do a proof of concept for byte array backed 
> UTF-8 strings in Java, but have not returned to this little project - 
> in any case, I do believe this would be a fundamental improvement to 
> the core of Java.)
>
> Best regards,
> Zenaan


More information about the discuss mailing list