RFR: 8338257: UTF8 lengths should be size_t not int [v7]

David Holmes dholmes at openjdk.org
Thu Aug 29 02:45:53 UTC 2024


> This work has been split out from JDK-8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths
> 
> The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`.  Though with compact strings this reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should therefore define UTF8 lengths as `size_t` to accommodate all possible representations. Higher-level API's can still use `int` if they know the strings (eg symbols) are sufficiently constrained in length.  See the comments in utf8.hpp that explain Strings, compact strings and the encoding.
> 
> As the existing JNI `GetStringUTFLength` still requires the current truncating behaviour of ` UNICODE::utf8_length` we add back `UNICODE::utf8_length_as_int` for it to use.
> 
> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& length)` use `length` as an IN/OUT parameter: it is the incoming (int) length of the jbyte/jchar array, and the outgoing (size_t) length of the UTF8 sequence. This makes some of the call sites a little messy with casts.
> 
> Testing:
>  - tiers 1-4
>  - GHA

David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision:

 - Merge branch 'master' into 8338257-utf8-length
 - Extra assertion requested by tstuefe
 - more missing casts
 - fix cast
 - missing cast
 - Fix incorrect comments and size_t use per Dean's review
 - Add missing cast for signed-to-unsigned converion.
 - unnecessary cast
 - Fix comments
 - Fix off-by-one error
 - ... and 3 more: https://git.openjdk.org/jdk/compare/6e817a2b...9dce4ffb

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/20560/files
  - new: https://git.openjdk.org/jdk/pull/20560/files/3d36ba52..9dce4ffb

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20560&range=05-06

  Stats: 38818 lines in 1231 files changed: 22068 ins; 10873 del; 5877 mod
  Patch: https://git.openjdk.org/jdk/pull/20560.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20560/head:pull/20560

PR: https://git.openjdk.org/jdk/pull/20560


More information about the serviceability-dev mailing list