RFR: 8345292: Improve javadocs for MemorySegment::getStrings defining word boundary cases [v4]

Wed Jun 11 21:29:29 UTC 2025

On Wed, 11 Jun 2025 17:24:11 GMT, Per Minborg <pminborg at openjdk.org> wrote:

>> This PR proposes to improve the 'MemorySegment.getString(long offset, Charset charset)` method documentation with respect to multi-octet concerns.
>
> Per Minborg has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Improve wording

src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1307:

> 1305:      *     return new String(bytes, charset);
> 1306:      * }
> 1307:      * @implNote If the segment size is not evenly dividable by the number of octets used

I think the relevant concepts here are:
* a valid charset has a fixed encoding where each character is turned into N bytes
* the terminator is also N bytes long
* the number of bytes read is given by the result of the integer division `S / N`, where `S` is the size of the segment (because if we have a remainder R < N, then we know it can't be a valid terminator)
* I'm not sure what you mean by that last sentence. Maybe that if you have `N = 4`, and you have `AA00`, `00BB`, those four zeros are not considered a terminator? I think speaking of alignment here is misleading, because we're not really suggesting that a terminator in a `N = 4` charset should start at an address that is 4-byte aligned -- we're just saying that the _offset_ at which that terminator starts (relative to the start of the segment) is a multiple of 4.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25715#discussion_r2141145457