string indexing (was: Java needs an immutable byte array wrapper)
Zenaan Harkness
zen at freedbms.net
Wed Jan 25 23:29:34 UTC 2017
On Sun, Nov 13, 2016 at 11:21:48PM +1100, Zenaan Harkness wrote:
> On Sat, Nov 12, 2016 at 08:06:55PM -0800, Per Bothner wrote:
> > On 11/12/2016 09:53 AM, Peter Lawrey wrote:
> > >Java 9 String has a byte [] at its core. I suspect it's not
> > >appropriate but worth thinking about.
>
> Time to read up on that, thanks.
>
> > Interesting. I would be even more interested if they could make
> > codePointAt and codePointCount be constant-time: A number of
> > programming languages define a string as a sequence of code-points,
> > and the indexing operator that their standard library provide is
> > basically codePointAt. Example languages include Python3, Scheme, and
> > the XQuery/XPath/XSLT family.
>
> Ack.
>
> Although grapheme indexing is probably more generally useful for
> multi-lingual UI. Swift basically gets "String" right as far as my
> reading of Swift's docs goes - not only code-points, but graphemes, the
> next layer of indexing above code-points.
>
> I cannott speak to Swift's implementation as to storage / time
> tradeoffs made.
>
> Trying to create a simple string formatter (left, right, centered) that
> was also "multi lingual" lead me into the deep dark past of Java's (pre
> v1.0) decision to go with UTF-16 (sensible at the time), which for 20
> years has been known to be deficient (prior to Java 1.1 it was when
> Unicode ascertained they needed more than 16 bits) and yet
> java.lang.String never got updated, at least until recently with Java 9,
> which now lays the foundation for a sane string class.
>
> Took me two full working weeks to sort out the mess in my head, so I
That should be 'volunteering weeks' or "working weeks as in 10 days"
or something. (I donate my time to a human rights cause, and getting
stuck into Java's String was ultimately a pleasant sidetrack from that.)
BTW pre-Java-1.0 was my first foray into the language back in my
university days, and it became my primary choice from that point.
C++ finally "caught up" (on what I consider important) with namespaces
and most recently modules. Java String's inability to handle graphemes
with any real proficiency has been the proverbial never ending teeth
grinding story for me over the past couple decades ...
> wrote up the details of that exploration here:
> https://zenaan.github.io/zen/javadoc/zen/lang/string.html
> (Note, this was pre-Java 9)
>
> Hopefully by Java 10, 11 or 12, we might see full grapheme support in
> Java (as is the case in Swift), now that String is implemented with byte
> array storage.
...
More information about the discuss
mailing list