hg: jdk7/tl/jdk: 6860431: Character.isSurrogate(char ch)
Martin Buchholz
martinrb at google.com
Wed Sep 2 09:27:39 PDT 2009
On Wed, Sep 2, 2009 at 01:07, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
> Am 02.09.2009 05:21, Martin Buchholz schrieb:
>
>
>
> On Tue, Sep 1, 2009 at 01:29, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>
>
> {@code is now the preferred way. I tried to modify the methods I changed,
> but didn't try to change the whole file.
>
>
> You also have added old style, so I asked why you have mixed it:
>
> /**
> - * The minimum value of a Unicode surrogate code unit in the UTF-16 encoding.
> + * The minimum value of a Unicode surrogate code unit in the
> + * UTF-16 encoding, constant <code>'\uD800'</code>.
> *
> * @since 1.5
> */
> public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE;
>
>
I would have used {@code here if I could figure out
how to make it work
("\" shows up literally in the generated output).
>
> A brave person such as yourself could try to
> become "code janitor" for the whole jdk.
>
>
> In this case it should be simple to replace <code>...</code> against {@code
> ...} on the whole JDK. My problem is, that I don't have the CPU-power to
> build the JDK, and check the whole javadoc if it would have broken.
>
Building the javadoc requires a lot of memory -
a javadoc bug - someone could try to fix that...
- you have mixed U+1234 and \u1234 style. Why?
They are different things. U+1234 describes a Unicode character or
codepoint,
while '\u1234' is a char (code unit, not code point).
See Unicode glossary.
Yes, after a closer look I can see the point, so I corrected their usage
> where I thought, it was wrong.
> But what's about using {@code U+10000}, found for
> MIN_SUPPLEMENTARY_CODE_POINT javadoc ?
> "U+10000" is not valid java code, but I must admit, that it looks better
> than "0x010000"
> Maybe we must use <tt>U+10000</tt> here.
>
>
>
>> - often you use '\' for '\', but not ever (e.g. '\t'). I think we can
>> use always '\'. There should not be so much developers in the world who
>> can't decode ISO-8859-1 or UTF-xx.
>
>
> We try hard to keep source code ASCII. Sorry, the world is adopting UTF-8,
> but the transition is rather slow. Maybe in 10 years we can go UTF-8
> everywhere.
>
>
> I have been fallen into a trap: '\' *is* ASCII, it's '\u005C'. so is
> there any reason remaining on '\' ???
>
\uXXXX has special meaning in java source files, even in comments.
- I would like to see backwards-referring like: public static final int
MIN_CODE_POINT = MIN_VALUE;
> public static final int MIN_SUPPLEMENTARY_CODE_POINT = MAX_VALUE + 1;
Those would work, but would add to the confusion
between code points and UTF-16 code units.
Notice how "MAX_VALUE + 1" looks like an oxymoron.
;-)
> But I don't have any problem as I don't have using Byte.MAX_VALUE + 1.
> The real source of the confusion is elsewhere, i.e. imagine we would have
> class Integer managing 16 + 32 bit values.
>
> Maybe it would become more clear adding MAX_SUPPLEMENTARY_CODE_POINT for
> *consistency* and having following order:
>
MAX_SUPPLEMENTARY_CODE_POINT would not be a bad thing to have,
but not compelling enough for the effort involved with any change to java
se.
People who use these constants anyways have to understand the
model - that code points are divided into bmp and supplementary.
Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20090902/8dd2100d/attachment.html
More information about the serviceability-dev
mailing list