CharsetEncoder.maxBytesPerChar()

Fri Sep 27 13:04:47 UTC 2019

Like Ulf, I am sometimes annoyed by the use of the "character" misnomer
throughout the API docs, and would support an effort to use "character" the
way that unicode.org uses it.
"char" no longer represents a Unicode character, but at least it provides a
short clear name, in the Java language, for "UTF-16 code unit" - if we use
it consistently!
https://unicode.org/faq/utf_bom.html#utf16-1

On Thu, Sep 26, 2019 at 2:24 PM <mark.reinhold at oracle.com> wrote:

> 2019/9/24 13:00:21 -0700, ulf.zibis at cosoco.de:
> > Am 21.09.19 um 00:03 schrieb mark.reinhold at oracle.com:
> >> To avoid this confusion, a more verbose specification might read:
> >>     * Returns the maximum number of $otype$s that will be produced for
> each
> >>     * $itype$ of input.  This value may be used to compute the
> worst-case size
> >>     * of the output buffer required for a given input sequence. This
> value
> >>     * accounts for any necessary content-independent prefix or suffix
> >> #if[encoder]
> >>     * $otype$s, such as byte-order marks.
> >> #end[encoder]
> >> #if[decoder]
> >>     * $otype$s.
> >> #end[decoder]
> >
> > wouldn't it be more clear to use "char" or even "{@code char}" instead
> > "character" as replacment for the $xtype$ parameters?
>
> The specifications of the Charset{De,En}coder classes make it clear
> up front that “character” means “sixteen-bit Unicode character,” so
> I don’t think changing “character” everywhere to “{@code char}” is
> necessary.
>
> This usage of “character” is common throughout the API specification.
> With the introduction of 32-bit Unicode characters we started calling
> those “code points,” but kept on calling sixteen-bit characters just
> “characters.”  (I don’t think the official term “Unicode code unit”
> ever caught on, and it’s a bit of a mouthful anyway.)
>
> - Mark
>