RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

Ulf Zibis Ulf.Zibis at CoSoCo.de
Tue Sep 23 21:50:43 UTC 2014


Am 23.09.2014 um 16:58 schrieb Salter, Thomas A:
> This response confuses me.  Are you saying that the UTF8 encoder is not really producing UTF8?  RFC 2279 and 3629 both clearly state that surrogates must be combined to form a 32-bit value which is then encoded as a 4-byte sequence.  In fact, the RFCs refer to the alternate encoding CESU_8 definition which encodes each half of the surrogate pair as a 3-byte UTF-8 sequence.

See also: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6798514

-Ulf



More information about the nio-dev mailing list