RFR: 6928542: Chinese characters in RTF are not decoded [v6]

Ichiroh Takiguchi itakiguchi at openjdk.org
Thu Sep 21 16:07:27 UTC 2023


> "character set of font" (font charset) table was created by "Rich Text Format Specification 1.9.1"
> https://interoperability.blob.core.windows.net/files/Archive_References/[MSFT-RTF].pdf
> It refers windgi.h
> https://learn.microsoft.com/en-us/windows/win32/api/wingdi/ns-wingdi-textmetrica
> 
> Test files and testcase are in bugid [JDK-6928542](https://bugs.openjdk.org/browse/JDK-6928542)
> 
> Additional change:
> Special character `\line` should `\n`
> 
> Additional information:
> 
> Add 2 hash tables
> - fcharsetToCP: Predefined conversion table, `fcharset` with number control word, from control word to Java charset name, `fcharset0` refers `windows-1252` Java charset name
> - fcharsetTable: Conversion table for each RTF file, `f` control word with number, from integer font numbers to Charset font charsets, In case of `{\f0\fnil\fcharset0 Segoe UI;}`, `0` refers Java Charset `windows-1252`
> 
> When RTF Character Set control word (like `\mac`) is used, unmappable character returns \u0000 and it's not written into RTF text..
> When fcharset control word is used, unmappable character returns \uFFFD (it's the same as replacement character on decoder), \u0000 is used for DBCS lead byte detection.
> If `f` or `par` control word is there and lead byte is remains on byte buffer for decoder, this byte data is as invalid character and write \uFFFD into RTF text.
> 
> If `f` control word is used without `fcharset`, `translationTable` char array is used.
> If `f` control word is used with `fcharset`, predefined Java Charset name is used (if missing, ISO8859_1 is used for fallback).
> 
> **Note:** Following GitHub actions were failed
> linux-cross-compile / build (riscv64), I opened following JBS.
>> [JDK-8314624](https://bugs.openjdk.org/browse/JDK-8314624) GHA: RISC-V cross-build was failed

Ichiroh Takiguchi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:

 - Merge branch 'master' of https://github.com/openjdk/jdk into 6928542
 - 6928542: Chinese characters in RTF are not decoded
 - 6928542: Chinese characters in RTF are not decoded
 - Merge branch 'master' of https://github.com/openjdk/jdk into HEAD
 - 6928542: Chinese characters in RTF are not decoded

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/13553/files
  - new: https://git.openjdk.org/jdk/pull/13553/files/cffed595..2f9c45cb

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=13553&range=05
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13553&range=04-05

  Stats: 110454 lines in 3097 files changed: 48068 ins; 23273 del; 39113 mod
  Patch: https://git.openjdk.org/jdk/pull/13553.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/13553/head:pull/13553

PR: https://git.openjdk.org/jdk/pull/13553


More information about the client-libs-dev mailing list