RFR: 8330532: Improve line-oriented text parsing in HotSpot
John R Rose
jrose at openjdk.org
Mon Apr 22 22:06:28 UTC 2024
On Mon, 22 Apr 2024 20:48:58 GMT, Matias Saavedra Silva <matsaave at openjdk.org> wrote:
>> (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text)
>>
>> HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example:
>>
>> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169
>> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066
>>
>> Common problems:
>> - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors.
>> - There's ad-hoc code that deals with `FILE*` differently than from memory.
>>
>> This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs.
>>
>> The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it.
>>
>> In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.)
>>
>> Credit:
>> The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html .
>>
>> John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR).
>
> src/hotspot/share/utilities/istream.hpp line 108:
>
>> 106: void* _must_free; // unless null, a malloc pointer which we must free
>> 107: size_t _line_count; // increasing non-resettable count of lines read
>> 108: char _small_buffer[SMALL_SIZE]; // buffer for holding lines
>
> maybe this should be called line_buffer instead?
No, it’s the small buffer that is the initial estimate of the line buffer, which in general must grow by heap allocation.
(BTW, the presence of small_buffer is the reason your other suggestion about `set_input` is wrong; will explain there…)
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1575403639
More information about the hotspot-dev
mailing list