<i18n dev> DOC PATCH: java.lang.Character fixes (doc only, not code)
Tom Christiansen
tchrist at perl.com
Thu Apr 14 19:42:17 PDT 2011
Sherman,
In the spirit of open source development and the whole Open JDK, I offer
all you hardworking folks this patch to j.l.Character's embedded javadoc.
(I also have some comments on the code, but those I'll send under
separate cover.)
I set out to fix nothing more than the "errors of commission" — meaning,
the factual misstatements — contained in the class's documentation. But
while I was in there, I couldn't help but also address a few things that
what one might in contrast call "errors of omission". Between the two
sorts of fixes, I think it makes your document a lot more accurate,
and therefore a lot more useful.
In all this, I kept to the same style and tone found in the existing text.
I also fixed a very few typos, but those I wouldn't have bothered you with.
This is a very brief patch. Next I'll fix j.l.Pattern's documentation,
but that, I am afraid, is going to take a bit more work than this did,
which was really fast and easy to fix, all things considerd.
Hope this helps!
--tom
-------------- next part --------------
--- java_lang_Character.java 2011-04-14 17:15:17.000000000 -0600
+++ java_lang_Character.java-EDIT 2011-04-14 19:41:19.000000000 -0600
@@ -59,14 +59,14 @@
* <p>The <code>char</code> data type (and therefore the value that a
* <code>Character</code> object encapsulates) are based on the
* original Unicode specification, which defined characters as
- * fixed-width 16-bit entities. The Unicode standard has since been
+ * fixed-width 16-bit entities. The Unicode Standard has since been
* changed to allow for characters whose representation requires more
* than 16 bits. The range of legal <em>code point</em>s is now
* U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>.
* (Refer to the <a
* href="http://www.unicode.org/reports/tr27/#notation"><i>
* definition</i></a> of the U+<i>n</i> notation in the Unicode
- * standard.)
+ * Standard.)
*
* <p><a name="BMP">The set of characters from U+0000 to U+FFFF is
* sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>.
@@ -5198,13 +5198,14 @@
}
/**
- * Determines if the specified character is a lowercase character.
+ * Determines if the specified character (Java <code>char</code>)
+ * char is a lowercase letter.
* <p>
- * A character is lowercase if its general category type, provided
- * by <code>Character.getType(ch)</code>, is
+ * A character is a lowercase letter (GC=Ll) if its general category
+ * type, provided by <code>Character.getType(ch)</code>, is
* <code>LOWERCASE_LETTER</code>.
* <p>
- * The following are examples of lowercase characters:
+ * The following are examples of lowercase letters:
* <p><blockquote><pre>
* a b c d e f g h i j k l m n o p q r s t u v w x y z
* '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6'
@@ -5212,7 +5213,14 @@
* '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6'
* '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF'
* </pre></blockquote>
- * <p> Many other Unicode characters are lowercase too.
+ * <p> Many other Unicode characters are lowercase, too, including many
+ * modifier letters and subscripts (which are GC=Lm, not GC=Ll), some
+ * Roman numerals (GC=Nl), some circled letters (GC=So in the
+ * Block=Enclosed_Alphanumerics), and even one combining character,
+ * U+02E4 COMBINING GREEK YPOGEGRAMMENI (GC=Mn). However, because
+ * those lowercase characters are not lowercase <i>letters</i>, this
+ * method will not identify them as lowercase. There are 159 such code
+ * points as of Unicode 6.0.<p>
*
* <p><b>Note:</b> This method cannot handle <a
* href="#supplementary"> supplementary characters</a>. To support
@@ -5232,14 +5240,14 @@
}
/**
- * Determines if the specified character (Unicode code point) is a
- * lowercase character.
+ * Determines if the specified Unicode code point is a
+ * lowercase letter.
* <p>
- * A character is lowercase if its general category type, provided
- * by {@link Character#getType getType(codePoint)}, is
+ * A character is a lowercase letter (GC=Ll) if its general category type,
+ * provided by {@link Character#getType getType(codePoint)}, is
* <code>LOWERCASE_LETTER</code>.
* <p>
- * The following are examples of lowercase characters:
+ * The following are examples of lowercase letters:
* <p><blockquote><pre>
* a b c d e f g h i j k l m n o p q r s t u v w x y z
* '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6'
@@ -5247,7 +5255,14 @@
* '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6'
* '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF'
* </pre></blockquote>
- * <p> Many other Unicode characters are lowercase too.
+ * <p> Many other Unicode characters are lowercase, too, including many
+ * modifier letters and subscripts (which are GC=Lm, not GC=Ll), some
+ * Roman numerals (GC=Nl), some circled letters (GC=So in the
+ * Block=Enclosed_Alphanumerics), and even one combining character,
+ * U+02E4 COMBINING GREEK YPOGEGRAMMENI (GC=Mn). However, because
+ * those lowercase characters are not lowercase <i>letters</i>, this
+ * method will not identify them as lowercase. There are 159 such code
+ * points as of Unicode 6.0.<p>
*
* @param codePoint the character (Unicode code point) to be tested.
* @return <code>true</code> if the character is lowercase;
@@ -5263,12 +5278,12 @@
}
/**
- * Determines if the specified character is an uppercase character.
+ * Determines if the specified character (Java <code>char</code>) is an uppercase letter.
* <p>
- * A character is uppercase if its general category type, provided by
+ * A character is an uppercase letter (GC=Lu) if its general category type, provided by
* <code>Character.getType(ch)</code>, is <code>UPPERCASE_LETTER</code>.
* <p>
- * The following are examples of uppercase characters:
+ * The following are examples of uppercase letters:
* <p><blockquote><pre>
* A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
* '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7'
@@ -5276,7 +5291,12 @@
* '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8'
* '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE'
* </pre></blockquote>
- * <p> Many other Unicode characters are uppercase too.<p>
+ * <p>Many other Unicode characters are uppercase, too, including some
+ * Roman numerals (which are GC=Nl, not GC=Lu) and some circled
+ * letters (GC=So in the Block=Enclosed_Alphanumerics). However,
+ * because those uppercase characters are not uppercase
+ * <i>letters</i>, this method will not identify them as being
+ * uppercase. There are 42 such characters as of Unicode 6.0.<p>
*
* <p><b>Note:</b> This method cannot handle <a
* href="#supplementary"> supplementary characters</a>. To support
@@ -5297,12 +5317,12 @@
}
/**
- * Determines if the specified character (Unicode code point) is an uppercase character.
+ * Determines if the specified Unicode code point is an uppercase letter.
* <p>
- * A character is uppercase if its general category type, provided by
+ * A character is an uppercase letter (GC=Lu) if its general category type, provided by
* {@link Character#getType(int) getType(codePoint)}, is <code>UPPERCASE_LETTER</code>.
* <p>
- * The following are examples of uppercase characters:
+ * The following are examples of uppercase letters:
* <p><blockquote><pre>
* A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
* '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7'
@@ -5310,7 +5330,12 @@
* '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8'
* '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE'
* </pre></blockquote>
- * <p> Many other Unicode characters are uppercase too.<p>
+ * <p>Many other Unicode characters are uppercase, too, including some
+ * Roman numerals (which are GC=Nl, not GC=Lu) and some circled
+ * letters (GC=So in the Block=Enclosed_Alphanumerics). However,
+ * because those uppercase characters are not uppercase
+ * <i>letters</i>, this method will not identify them as being
+ * uppercase. There are 42 such characters as of Unicode 6.0.<p>
*
* @param codePoint the character (Unicode code point) to be tested.
* @return <code>true</code> if the character is uppercase;
@@ -5326,14 +5351,14 @@
}
/**
- * Determines if the specified character is a titlecase character.
+ * Determines if the specified character (Java <code>char</code>) is a titlecase letter.
* <p>
- * A character is a titlecase character if its general
+ * A character is a titlecase letter (GC=Lt) if its general
* category type, provided by <code>Character.getType(ch)</code>,
* is <code>TITLECASE_LETTER</code>.
* <p>
- * Some characters look like pairs of Latin letters. For example, there
- * is an uppercase letter that looks like "LJ" and has a corresponding
+ * Some characters look like pairs of letters. For example, there
+ * is an uppercase Latin letter that looks like "LJ" and has a corresponding
* lowercase letter that looks like "lj". A third form, which looks like "Lj",
* is the appropriate form to use when rendering a word in lowercase
* with initial capitals, as for a book title.
@@ -5345,8 +5370,12 @@
* <li><code>LATIN CAPITAL LETTER L WITH SMALL LETTER J</code>
* <li><code>LATIN CAPITAL LETTER N WITH SMALL LETTER J</code>
* <li><code>LATIN CAPITAL LETTER D WITH SMALL LETTER Z</code>
+ * <li><code>GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI</code>
+ * <li><code>GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI</code>
* </ul>
- * <p> Many other Unicode characters are titlecase too.<p>
+ * <p> Many other Unicode characters are titlecase letters, too.
+ * As of Unicode 6.0, there are 31 titlecase characters, all of
+ * which are titlecase letters.<p>
*
* <p><b>Note:</b> This method cannot handle <a
* href="#supplementary"> supplementary characters</a>. To support
@@ -5367,14 +5396,14 @@
}
/**
- * Determines if the specified character (Unicode code point) is a titlecase character.
+ * Determines if the specified Unicode code point is a titlecase letter.
* <p>
- * A character is a titlecase character if its general
- * category type, provided by {@link Character#getType(int) getType(codePoint)},
+ * A character is a titlecase letter (GC=Lt) if its general
+ * category type, provided by <code>Character.getType(ch)</code>,
* is <code>TITLECASE_LETTER</code>.
* <p>
- * Some characters look like pairs of Latin letters. For example, there
- * is an uppercase letter that looks like "LJ" and has a corresponding
+ * Some characters look like pairs of letters. For example, there
+ * is an uppercase Latin letter that looks like "LJ" and has a corresponding
* lowercase letter that looks like "lj". A third form, which looks like "Lj",
* is the appropriate form to use when rendering a word in lowercase
* with initial capitals, as for a book title.
@@ -5386,8 +5415,12 @@
* <li><code>LATIN CAPITAL LETTER L WITH SMALL LETTER J</code>
* <li><code>LATIN CAPITAL LETTER N WITH SMALL LETTER J</code>
* <li><code>LATIN CAPITAL LETTER D WITH SMALL LETTER Z</code>
+ * <li><code>GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI</code>
+ * <li><code>GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI</code>
* </ul>
- * <p> Many other Unicode characters are titlecase too.<p>
+ * <p> Many other Unicode characters are titlecase letters, too.
+ * As of Unicode 6.0, there are 31 titlecase characters, all of
+ * which are titlecase letters.<p>
*
* @param codePoint the character (Unicode code point) to be tested.
* @return <code>true</code> if the character is titlecase;
@@ -5405,7 +5438,7 @@
/**
* Determines if the specified character is a digit.
* <p>
- * A character is a digit if its general category type, provided
+ * A character is a digit (GC=Nd) if its general category type, provided
* by <code>Character.getType(ch)</code>, is
* <code>DECIMAL_DIGIT_NUMBER</code>.
* <p>
@@ -5444,7 +5477,7 @@
/**
* Determines if the specified character (Unicode code point) is a digit.
* <p>
- * A character is a digit if its general category type, provided
+ * A character is a digit (GC=Nd) if its general category type, provided
* by {@link Character#getType(int) getType(codePoint)}, is
* <code>DECIMAL_DIGIT_NUMBER</code>.
* <p>
@@ -5529,7 +5562,7 @@
}
/**
- * Determines if the specified character is a letter.
+ * Determines if the specified character (Java <code>char</code>) is a letter.
* <p>
* A character is considered to be a letter if its general
* category type, provided by <code>Character.getType(ch)</code>,
@@ -5542,13 +5575,20 @@
* <li> <code>OTHER_LETTER</code>
* </ul>
*
- * Not all letters have case. Many characters are
- * letters but are neither uppercase nor lowercase nor titlecase.
+ * Not all letters have case, and not all cased characters are
+ * letters. Many characters are letters but are neither uppercase
+ * (GC=Lu) nor lowercase (GC=Ll) nor titlecase (GC=Lt). Letters
+ * without case are either Modifier_Letters (GC=Lm) or Other_Letters
+ * (GC=Lo), but some modifier letters I<do> have case. Similarly, not all
+ * characters with case are letters, such as the Roman numerals, which
+ * are Letter_Numbers (GC=Nl) and the circled letters, which are
+ * Other_Symbols (GC=So). There are 201 cased characters as of
+ * Unicode 6.0 that are neither uppercase, lowercase, nor titlecase.
*
* <p><b>Note:</b> This method cannot handle <a
* href="#supplementary"> supplementary characters</a>. To support
* all Unicode characters, including supplementary characters, use
- * the {@link #isLetter(int)} method.
+ * the {@link #isLetter(int)} method.<p>
*
* @param ch the character to be tested.
* @return <code>true</code> if the character is a letter;
@@ -5581,8 +5621,15 @@
* <li> <code>OTHER_LETTER</code>
* </ul>
*
- * Not all letters have case. Many characters are
- * letters but are neither uppercase nor lowercase nor titlecase.
+ * Not all letters have case, and not all cased characters are
+ * letters. Many characters are letters but are neither uppercase
+ * (GC=Lu) nor lowercase (GC=Ll) nor titlecase (GC=Lt). Letters
+ * without case are either Modifier_Letters (GC=Lm) or Other_Letters
+ * (GC=Lo), but some modifier letters I<do> have case. Similarly, not all
+ * characters with case are letters, such as the Roman numerals, which
+ * are Letter_Numbers (GC=Nl) and the circled letters, which are
+ * Other_Symbols (GC=So). There are 201 cased characters as of
+ * Unicode 6.0 that are neither uppercase, lowercase, nor titlecase.<p>
*
* @param codePoint the character (Unicode code point) to be tested.
* @return <code>true</code> if the character is a letter;
@@ -5606,7 +5653,7 @@
}
/**
- * Determines if the specified character is a letter or digit.
+ * Determines if the specified character (Java <code>char</code>) is a letter or digit.
* <p>
* A character is considered to be a letter or digit if either
* <code>Character.isLetter(char ch)</code> or
@@ -5661,14 +5708,14 @@
}
/**
- * Determines if the specified character is permissible as the first
+ * Determines if the specified character (Java <code>char</code>) is permissible as the first
* character in a Java identifier.
* <p>
* A character may start a Java identifier if and only if
* one of the following is true:
* <ul>
* <li> {@link #isLetter(char) isLetter(ch)} returns <code>true</code>
- * <li> {@link #getType(char) getType(ch)} returns <code>LETTER_NUMBER</code>
+ * <li> {@link #getType(char) getType(ch)} returns <code>LETTER_NUMBER</code> (GC=Nl)
* <li> ch is a currency symbol (such as "$")
* <li> ch is a connecting punctuation character (such as "_").
* </ul>
@@ -5691,7 +5738,7 @@
}
/**
- * Determines if the specified character may be part of a Java
+ * Determines if the specified character (Java <code>char</code>) may be part of a Java
* identifier as other than the first character.
* <p>
* A character may be part of a Java identifier if and only if any
@@ -5727,7 +5774,7 @@
}
/**
- * Determines if the specified character is
+ * Determines if the specified character (Java <code>char</code>) is
* permissible as the first character in a Java identifier.
* <p>
* A character may start a Java identifier if and only if
@@ -5787,7 +5834,7 @@
}
/**
- * Determines if the specified character may be part of a Java
+ * Determines if the specified character (Java <code>char</code>) may be part of a Java
* identifier as other than the first character.
* <p>
* A character may be part of a Java identifier if any of the following
@@ -5857,7 +5904,7 @@
}
/**
- * Determines if the specified character is permissible as the
+ * Determines if the specified character (Java <code>char</code>) is permissible as the
* first character in a Unicode identifier.
* <p>
* A character may start a Unicode identifier if and only if
@@ -5910,7 +5957,7 @@
}
/**
- * Determines if the specified character may be part of a Unicode
+ * Determines if the specified character (Java <code>char</code>) may be part of a Unicode
* identifier as other than the first character.
* <p>
* A character may be part of a Unicode identifier if and only if
@@ -5974,7 +6021,7 @@
}
/**
- * Determines if the specified character should be regarded as
+ * Determines if the specified character (Java <code>char</code>) should be regarded as
* an ignorable character in a Java identifier or a Unicode identifier.
* <p>
* The following Unicode characters are ignorable in a Java identifier
@@ -6039,20 +6086,34 @@
}
/**
- * Converts the character argument to lowercase using case
- * mapping information from the UnicodeData file.
+ * Converts the character (Java <code>char</code>) argument to lowercase
+ * using case mapping information from the UnicodeData file.
* <p>
* Note that
* <code>Character.isLowerCase(Character.toLowerCase(ch))</code>
- * does not always return <code>true</code> for some ranges of
- * characters, particularly those that are symbols or ideographs.
+ * does not return <code>true</code> for some ranges of
+ * lowercase characters, particularly those that are symbols or ideographs,
+ * or lowercase modifier letters.
+ *
+ * <p><b>Note:</b> This method cannot handle characters whose lowercase mapping
+ * according to the SpecialCasing file in the Unicode specification
+ * returns more than one character. As of Unicode 6.0, there is only
+ * one such code point (if locales are not considered):
+ *
+ * <ul>
+ * <li><code> LATIN CAPITAL LETTER I WITH DOT ABOVE</code></li>
+ * </ul>
*
* <p>In general, {@link String#toLowerCase()} should be used to map
* characters to lowercase. <code>String</code> case mapping methods
* have several benefits over <code>Character</code> case mapping methods.
* <code>String</code> case mapping methods can perform locale-sensitive
* mappings, context-sensitive mappings, and 1:M character mappings, whereas
- * the <code>Character</code> case mapping methods cannot.
+ * the <code>Character</code> case mapping methods cannot. In Unicode terminology,
+ * <code>Character</code> case mappings are <i>simple case mappings</i> (because they
+ * can only map to a single character), while <code>String</code> case mappings
+ * are <i>full case mappings</i>, because they can map to multiple characters,
+ * as defined by the SpecialCasing file in the Unicode specification
*
* <p><b>Note:</b> This method cannot handle <a
* href="#supplementary"> supplementary characters</a>. To support
@@ -6071,20 +6132,33 @@
/**
* Converts the character (Unicode code point) argument to
- * lowercase using case mapping information from the UnicodeData
+ * lowercase using simple case mapping information from the UnicodeData
* file.
*
* <p> Note that
* <code>Character.isLowerCase(Character.toLowerCase(codePoint))</code>
- * does not always return <code>true</code> for some ranges of
- * characters, particularly those that are symbols or ideographs.
+ * does not return <code>true</code> for some ranges of
+ * lowercase characters, particularly those that are symbols or ideographs,
+ * or lowercase modifier letters.
+ *
+ * <p><b>Note:</b> This method cannot handle characters whose lowercase mapping
+ * according to the SpecialCasing file returns more than one character.
+ * As of Unicode 6.0, there is only one such code point,
+ * if locales are not considered:
+ * <ul>
+ * <li><code> LATIN CAPITAL LETTER I WITH DOT ABOVE</code></li>
+ * </ul>
*
* <p>In general, {@link String#toLowerCase()} should be used to map
* characters to lowercase. <code>String</code> case mapping methods
* have several benefits over <code>Character</code> case mapping methods.
* <code>String</code> case mapping methods can perform locale-sensitive
* mappings, context-sensitive mappings, and 1:M character mappings, whereas
- * the <code>Character</code> case mapping methods cannot.
+ * the <code>Character</code> case mapping methods cannot. In Unicode terminology,
+ * <code>Character</code> case mappings are <i>simple case mappings</i> (because they
+ * can only map to a single character), while <code>String</code> case mappings
+ * are <i>full case mappings</i>, because they can map to multiple characters,
+ * as defined by the SpecialCasing file in the Unicode specification
*
* @param codePoint the character (Unicode code point) to be converted.
* @return the lowercase equivalent of the character (Unicode code
@@ -6099,20 +6173,39 @@
}
/**
- * Converts the character argument to uppercase using case mapping
- * information from the UnicodeData file.
+ * Converts the character (Java <code>char</code>) argument to
+ * uppercase using simple case mapping information from the
+ * UnicodeData file.
+ *
* <p>
* Note that
* <code>Character.isUpperCase(Character.toUpperCase(ch))</code>
- * does not always return <code>true</code> for some ranges of
- * characters, particularly those that are symbols or ideographs.
+ * does not return <code>true</code> for some ranges of
+ * uppercase characters, particularly those that are symbols or ideographs.
+ *
+ * <p><b>Note:</b> This method cannot handle characters whose uppercase mapping
+ * according to the SpecialCasing file in the Unicode specification
+ * than one character. Examples of such code points include:
+ * <ul>
+ * <li><code>LATIN SMALL LETTER SHARP S</code></li>
+ * <li><code>LATIN SMALL LETTER J WITH CARON</code></li>
+ * <li><code>LATIN SMALL LETTER A WITH RIGHT HALF RING</code></li>
+ * <li><code>GREEK SMALL LETTER UPSILON WITH PSILI</code></li>
+ * <li><code>LATIN SMALL LIGATURE FI</code></li>
+ * <li><code>LATIN SMALL LIGATURE ST</code></li>
+ * </ul>
+ * <p>As of Unicode 6.0, there are 102 such code points.
*
* <p>In general, {@link String#toUpperCase()} should be used to map
* characters to uppercase. <code>String</code> case mapping methods
* have several benefits over <code>Character</code> case mapping methods.
* <code>String</code> case mapping methods can perform locale-sensitive
* mappings, context-sensitive mappings, and 1:M character mappings, whereas
- * the <code>Character</code> case mapping methods cannot.
+ * the <code>Character</code> case mapping methods cannot. In Unicode terminology,
+ * <code>Character</code> case mappings are <i>simple case mappings</i> because they
+ * can only map to a single character, while <code>String</code> case mappings
+ * are <i>full case mappings</i> because they can map to multiple characters,
+ * as defined by the SpecialCasing file in the Unicode specification
*
* <p><b>Note:</b> This method cannot handle <a
* href="#supplementary"> supplementary characters</a>. To support
@@ -6131,20 +6224,37 @@
/**
* Converts the character (Unicode code point) argument to
- * uppercase using case mapping information from the UnicodeData
+ * uppercase using simple case mapping information from the UnicodeData
* file.
*
* <p>Note that
* <code>Character.isUpperCase(Character.toUpperCase(codePoint))</code>
- * does not always return <code>true</code> for some ranges of
- * characters, particularly those that are symbols or ideographs.
+ * does not return <code>true</code> for some ranges of
+ * uppercase characters, particularly those that are symbols or ideographs.
+
+ * <p><b>Note:</b> This method cannot handle characters whose uppercase mapping
+ * according to the SpecialCasing file returns more than one character.
+ * Examples of such code points include:
+ * <ul>
+ * <li><code>LATIN SMALL LETTER SHARP S</code></li>
+ * <li><code>LATIN SMALL LETTER J WITH CARON</code></li>
+ * <li><code>LATIN SMALL LETTER A WITH RIGHT HALF RING</code></li>
+ * <li><code>GREEK SMALL LETTER UPSILON WITH PSILI</code></li>
+ * <li><code>LATIN SMALL LIGATURE FI</code></li>
+ * <li><code>LATIN SMALL LIGATURE ST</code></li>
+ * </ul>
+ * <p>As of Unicode 6.0, there are 102 such code points.
*
* <p>In general, {@link String#toUpperCase()} should be used to map
* characters to uppercase. <code>String</code> case mapping methods
* have several benefits over <code>Character</code> case mapping methods.
* <code>String</code> case mapping methods can perform locale-sensitive
* mappings, context-sensitive mappings, and 1:M character mappings, whereas
- * the <code>Character</code> case mapping methods cannot.
+ * the <code>Character</code> case mapping methods cannot. In Unicode terminology,
+ * <code>Character</code> case mappings are <i>simple case mappings</i> because they
+ * can only map to a single character, while <code>String</code> case mappings
+ * are <i>full case mappings</i> because they can map to multiple characters
+ * as defined by the SpecialCasing file in the Unicode specification.
*
* @param codePoint the character (Unicode code point) to be converted.
* @return the uppercase equivalent of the character, if any;
@@ -6159,25 +6269,39 @@
}
/**
- * Converts the character argument to titlecase using case mapping
- * information from the UnicodeData file. If a character has no
- * explicit titlecase mapping and is not itself a titlecase char
- * according to UnicodeData, then the uppercase mapping is
- * returned as an equivalent titlecase mapping. If the
- * <code>char</code> argument is already a titlecase
- * <code>char</code>, the same <code>char</code> value will be
- * returned.
+ * Converts the character (Java <code>char</code>) argument to
+ * titlecase using simple case mapping information from the
+ * UnicodeData file. If a character has no explicit titlecase
+ * mapping and is not itself a titlecase char according to
+ * UnicodeData, then the simple uppercase mapping is returned as an
+ * equivalent titlecase mapping. Simple mapping means that only single
+ * character returns are possible, and any full case mapping from
+ * the SpecialCasing file in the Unicode specification is
+ * disregarded. If the <code>char</code> argument is already a
+ * titlecase character, that same value will be returned.
+ *
* <p>
* Note that
* <code>Character.isTitleCase(Character.toTitleCase(ch))</code>
- * does not always return <code>true</code> for some ranges of
- * characters.
+ * may not always return <code>true</code> for some ranges of
+ * titlecase characters.
*
* <p><b>Note:</b> This method cannot handle <a
* href="#supplementary"> supplementary characters</a>. To support
* all Unicode characters, including supplementary characters, use
* the {@link #toTitleCase(int)} method.
*
+ * <p><b>Note:</b> This method cannot handle characters whose titlecase
+ * mapping according to the SpecialCasing file returns more than one character.
+ * Examples of such code points include:
+ * <ul>
+ * <li><code>LATIN SMALL LETTER SHARP S</code></li>
+ * <li><code>LATIN SMALL LETTER J WITH CARON</code></li>
+ * <li><code>LATIN SMALL LIGATURE FI</code></li>
+ * <li><code>LATIN SMALL LIGATURE ST</code></li>
+ * </ul>
+ * <p>As of Unicode 6.0, there are 48 such code points.
+ *
* @param ch the character to be converted.
* @return the titlecase equivalent of the character, if any;
* otherwise, the character itself.
@@ -6191,20 +6315,34 @@
}
/**
- * Converts the character (Unicode code point) argument to titlecase using case mapping
- * information from the UnicodeData file. If a character has no
- * explicit titlecase mapping and is not itself a titlecase char
- * according to UnicodeData, then the uppercase mapping is
- * returned as an equivalent titlecase mapping. If the
- * character argument is already a titlecase
- * character, the same character value will be
- * returned.
+ * Converts the character (Unicode code point) argument to titlecase
+ * using simple case mapping information from the UnicodeData file. If a
+ * character has no explicit titlecase mapping and is not itself a
+ * titlecase char according to UnicodeData, then the simple uppercase
+ * mapping is returned as an equivalent titlecase mapping. Simple
+ * mapping means that only single-character returns are possible,
+ * and any full case mapping from the SpecialCasing file in the
+ * Unicode specification is disregarded. If the <code>int</code>
+ * argument is already a titlecase character, that same value will
+ * be returned.
+ *
*
* <p>Note that
* <code>Character.isTitleCase(Character.toTitleCase(codePoint))</code>
* does not always return <code>true</code> for some ranges of
* characters.
*
+ * <p><b>Note:</b> This method cannot handle characters whose titlecase
+ * mapping according to the SpecialCasing file returns more than one character.
+ * Examples of such code points include:
+ * <ul>
+ * <li><code>LATIN SMALL LETTER SHARP S</code></li>
+ * <li><code>LATIN SMALL LETTER J WITH CARON</code></li>
+ * <li><code>LATIN SMALL LIGATURE FI</code></li>
+ * <li><code>LATIN SMALL LIGATURE ST</code></li>
+ * </ul>
+ * <p>As of Unicode 6.0, there are 48 such code points.
+ *
* @param codePoint the character (Unicode code point) to be converted.
* @return the titlecase equivalent of the character, if any;
* otherwise, the character itself.
@@ -6306,7 +6444,7 @@
* The letters A-Z in their uppercase (<code>'\u0041'</code> through
* <code>'\u005A'</code>), lowercase
* (<code>'\u0061'</code> through <code>'\u007A'</code>), and
- * full width variant (<code>'\uFF21'</code> through
+ * fullwidth variant (<code>'\uFF21'</code> through
* <code>'\uFF3A'</code> and <code>'\uFF41'</code> through
* <code>'\uFF5A'</code>) forms have numeric values from 10
* through 35. This is independent of the Unicode specification,
@@ -6344,7 +6482,7 @@
* The letters A-Z in their uppercase (<code>'\u0041'</code> through
* <code>'\u005A'</code>), lowercase
* (<code>'\u0061'</code> through <code>'\u007A'</code>), and
- * full width variant (<code>'\uFF21'</code> through
+ * fullwidth variant (<code>'\uFF21'</code> through
* <code>'\uFF3A'</code> and <code>'\uFF41'</code> through
* <code>'\uFF5A'</code>) forms have numeric values from 10
* through 35. This is independent of the Unicode specification,
@@ -6404,11 +6542,15 @@
/**
- * Determines if the specified character is a Unicode space character.
- * A character is considered to be a space character if and only if
- * it is specified to be a space character by the Unicode standard. This
- * method returns true if the character's general category type is any of
- * the following:
+ * Determines if the specified character (Java <code>char</code>) is
+ * a Unicode space separator (GC=Zs), line separator (GC=Zl), or
+ * paragraph separator (GC=Zp). This is I<not> equivalent to the
+ * Unicode White_Space property, which also includes six control
+ * characters. A character is considered to be a space character if
+ * and only if it is specified to be a space, line, or paragraph
+ * separator by the Unicode Standard. This method returns true if
+ * the character's general category type is any of the following:
+
* <ul>
* <li> <code>SPACE_SEPARATOR</code>
* <li> <code>LINE_SEPARATOR</code>
@@ -6431,10 +6573,13 @@
}
/**
- * Determines if the specified character (Unicode code point) is a
- * Unicode space character. A character is considered to be a
- * space character if and only if it is specified to be a space
- * character by the Unicode standard. This method returns true if
+ * Determines if the specified character (Unicode code point) is
+ * a Unicode space separator (GC=Zs), line separator (GC=Zl), or
+ * paragraph separator (GC=Zp). This is I<not> equivalent to the
+ * Unicode White_Space property, which also includes six control
+ * characters. A character is considered to be a space character if
+ * and only if it is specified to be a space, line, or paragraph
+ * separator by the Unicode Standard. This method returns true if
* the character's general category type is any of the following:
*
* <ul>
@@ -6475,6 +6620,8 @@
* <li> It is <code>'\u001E'</code>, U+001E RECORD SEPARATOR.
* <li> It is <code>'\u001F'</code>, U+001F UNIT SEPARATOR.
* </ul>
+ * <p><b>Note:</b> The Unicode White_Space property is not
+ * the same as Java whitespace.
*
* <p><b>Note:</b> This method cannot handle <a
* href="#supplementary"> supplementary characters</a>. To support
@@ -6493,11 +6640,11 @@
/**
* Determines if the specified character (Unicode code point) is
- * white space according to Java. A character is a Java
- * whitespace character if and only if it satisfies one of the
- * following criteria:
+ * white space according to Java, not according to Unicode. A
+ * character is a Java whitespace character if and only if it
+ * satisfies one of the following criteria:
* <ul>
- * <li> It is a Unicode space character ({@link #SPACE_SEPARATOR},
+ * <li> It is a Unicode space separator (character{@link #SPACE_SEPARATOR},
* {@link #LINE_SEPARATOR}, or {@link #PARAGRAPH_SEPARATOR})
* but is not also a non-breaking space (<code>'\u00A0'</code>,
* <code>'\u2007'</code>, <code>'\u202F'</code>).
@@ -6511,6 +6658,8 @@
* <li> It is <code>'\u001E'</code>, U+001E RECORD SEPARATOR.
* <li> It is <code>'\u001F'</code>, U+001F UNIT SEPARATOR.
* </ul>
+ * <p><b>Note:</b> The Unicode White_Space property is not
+ * the same as Java whitespace.
* <p>
*
* @param codePoint the character (Unicode code point) to be tested.
@@ -6524,12 +6673,14 @@
}
/**
- * Determines if the specified character is an ISO control
+ * Determines if the specified character (Java <code>char</code>) is an ISO control
* character. A character is considered to be an ISO control
* character if its code is in the range <code>'\u0000'</code>
* through <code>'\u001F'</code> or in the range
* <code>'\u007F'</code> through <code>'\u009F'</code>.
*
+ * <p><b>Note:</b> This is identical to the Unicode Control property (GC=Cc).
+ *
* <p><b>Note:</b> This method cannot handle <a
* href="#supplementary"> supplementary characters</a>. To support
* all Unicode characters, including supplementary characters, use
@@ -6548,12 +6699,14 @@
}
/**
- * Determines if the referenced character (Unicode code point) is an ISO control
+ * Determines if the specified character (Unicode code point) is an ISO control
* character. A character is considered to be an ISO control
* character if its code is in the range <code>'\u0000'</code>
* through <code>'\u001F'</code> or in the range
* <code>'\u007F'</code> through <code>'\u009F'</code>.
*
+ * <p><b>Note:</b> This is identical to the Unicode Control property (GC=Cc).
+ *
* @param codePoint the character (Unicode code point) to be tested.
* @return <code>true</code> if the character is an ISO control character;
* <code>false</code> otherwise.
@@ -6570,7 +6723,8 @@
}
/**
- * Returns a value indicating a character's general category.
+ * Returns a value indicating the general category of the
+ * specified character (Java <code>char</code>).
*
* <p><b>Note:</b> This method cannot handle <a
* href="#supplementary"> supplementary characters</a>. To support
@@ -6617,7 +6771,8 @@
}
/**
- * Returns a value indicating a character's general category.
+ * Returns a value indicating the general category of the
+ * specified character (Unicode code point).
*
* @param codePoint the character (Unicode code point) to be tested.
* @return a value of type <code>int</code> representing the
@@ -6697,7 +6852,7 @@
/**
* Returns the Unicode directionality property for the given
- * character. Character directionality is used to calculate the
+ * character (Java <code>char</code>). Character directionality is used to calculate the
* visual ordering of text. The directionality value of undefined
* <code>char</code> values is <code>DIRECTIONALITY_UNDEFINED</code>.
*
@@ -6740,7 +6895,7 @@
* Returns the Unicode directionality property for the given
* character (Unicode code point). Character directionality is
* used to calculate the visual ordering of text. The
- * directionality value of undefined character is {@link
+ * directionality value of an undefined character is {@link
* #DIRECTIONALITY_UNDEFINED}.
*
* @param codePoint the character (Unicode code point) for which
@@ -6774,7 +6929,7 @@
}
/**
- * Determines whether the character is mirrored according to the
+ * Determines whether the character (Java <code>char</code>) is mirrored according to the
* Unicode specification. Mirrored characters should have their
* glyphs horizontally mirrored when displayed in text that is
* right-to-left. For example, <code>'\u0028'</code> LEFT
@@ -6884,7 +7039,7 @@
* @since 1.4
*/
static char[] toUpperCaseCharArray(int codePoint) {
- // As of Unicode 4.0, 1:M uppercasings only happen in the BMP.
+ // As of Unicode 6.0, 1:M uppercasings only happen in the BMP.
assert isBmpCodePoint(codePoint);
return CharacterData.of(codePoint).toUpperCaseCharArray(codePoint);
}
@@ -6917,7 +7072,7 @@
* Note: if the specified character is not assigned a name by
* the <i>UnicodeData</i> file (part of the Unicode Character
* Database maintained by the Unicode Consortium), the returned
- * name is the same as the result of expression
+ * name is the same as the result of expression.
*
* <blockquote><code>
* Character.UnicodeBlock.of(codePoint)
More information about the i18n-dev
mailing list