review request for 6798511/6860431: Include functionality of Surrogate in Character
Martin Buchholz
martinrb at google.com
Thu Aug 27 01:24:27 UTC 2009
On Wed, Aug 26, 2009 at 02:59, Ulf Zibis<Ulf.Zibis at gmx.de> wrote:
> Hi all,
>
> I have put all important things of sun.nio.cs.Surrogate to java.* packages,
> I guess. We likely could get rid of it.
I am in principle of adding the method isBMP to Character.java
(I did write it!) but I was hoping we can find a better name for it...
Hmmmmm..... The Unicode Glossary uses BMP consistently,
so maybe isBMP is the best we can come up with?
How about isBMPCodePoint?
http://unicode.org/glossary/#BMP_code_point
Yeah, I think that's not too bad.
Martin
> See: https://bugs.openjdk.java.net/show_bug.cgi?id=100104
>
>
> Some comments:
> ================
> public static int toChars(int codePoint, char[] dst, int dstIndex) {
> - if (codePoint < 0 || codePoint > MAX_CODE_POINT) {
> - throw new IllegalArgumentException();
> - }
> - if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) {
> - dst[dstIndex] = (char) codePoint;
> - return 1;
> - }
> - toSurrogates(codePoint, dst, dstIndex);
> + if (codePoint >= MIN_CODE_POINT) {
> + if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) {
> + dst[dstIndex] = (char)codePoint;
> + return 1;
> + }
> + if (codePoint <= MAX_CODE_POINT)
> + return toSurrogates(codePoint, dst, dstIndex);
> + }
> + throw new IllegalArgumentException();
> + }
> ----------------
> saves 1 compare in case of code point is in BMP
Yes. A good change.
> ================
> + public static char[] toChars(int codePoint) {
> + if (codePoint >= MIN_CODE_POINT) {
> + char[] result;
> + if (codePoint < MIN_SUPPLEMENTARY_CODE_POINT) {
> + result = C1.clone();
> ----------------
> I guess cloning is faster than new instantiation, as fresh one has to be
> initialized by 0s, but not sure
Cloning has to initialize as well, with the original contents.
Often there is special support for zeroing newly allocated objects.
So I think this is a bad idea.
> ================
> + static int toSurrogates(int codePoint, char[] dst, int index) {
> + // We write elements "backwards" to guarantee all-or-nothing //
> TODO: isn't forward faster ?
> + dst[index+1] = lowSurrogate(codePoint);
> + dst[index] = highSurrogate(codePoint);
> ----------------
> IMO this guarantee is nowhere needed, so why potentially waste performance ?
All-or-nothing is a fundamental software reliability principle.
Why have your data structure in a corrupted state
if it is not too onerous to avoid it?
The difference in behavior is certainly user-visible.
> ================
> - // Pass 2: Allocate and fill in char[]
> - char[] v = new char[n];
> - for (int i = offset, j = 0; i < offset + count; i++) {
> - int c = codePoints[i];
> - if (c < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
> - v[j++] = (char) c;
> - } else {
> - Character.toSurrogates(c, v, j);
> - j += 2;
> - }
> - }
> -
> - this.value = v;
> + // Pass 2: Allocate and fill in char[]
> + for (int i = offset, j = 0; i < offset + count; i++) {
> + int c = codePoints[i];
> + if (c < Character.MIN_SUPPLEMENTARY_CODE_POINT)
> + value[j++] = (char)c;
> + else
> + j += Character.toSurrogates(c, value, j);
> }
> ----------------
> I guess HotSpot is aware of immutability of String, so no need to locally
> buffer v expletive to save indirection and against concurrent modifications.
I like the simplification of this method a lot, but it actually helps
hotspot a little
to hold the new array in a local and assign it to a field at the end
of the constructor.
This sort of thing (copying a field into a local) is done in
java.util.concurrent a lot.
---
It would help to separate cosmetic and "real" changes to get them
reviewed and accepted.
Martin
More information about the core-libs-dev
mailing list