RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8
Ulf Zibis
Ulf.Zibis at CoSoCo.de
Tue Sep 23 21:50:43 UTC 2014
Am 23.09.2014 um 16:58 schrieb Salter, Thomas A:
> This response confuses me. Are you saying that the UTF8 encoder is not really producing UTF8? RFC 2279 and 3629 both clearly state that surrogates must be combined to form a 32-bit value which is then encoded as a 4-byte sequence. In fact, the RFCs refer to the alternate encoding CESU_8 definition which encodes each half of the surrogate pair as a 3-byte UTF-8 sequence.
See also: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6798514
-Ulf
More information about the nio-dev
mailing list