RFR: 8330532: Improve line-oriented text parsing in HotSpot

John R Rose jrose at openjdk.org
Mon Apr 22 22:15:28 UTC 2024


On Mon, 22 Apr 2024 20:57:05 GMT, Johan Sjölen <jsjolen at openjdk.org> wrote:

>> (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text)
>> 
>> HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example:
>> 
>> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169
>> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066
>> 
>> Common problems:
>> - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors.
>> - There's ad-hoc code that deals with `FILE*` differently than from memory.
>> 
>> This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs.
>> 
>> The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it.
>> 
>> In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.)
>> 
>> Credit:
>> The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html .
>> 
>> John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR).
>
> src/hotspot/share/utilities/istream.hpp line 249:
> 
>> 247: 
>> 248:   // Discards any previous input and sets the given input source.
>> 249:   void set_input(Input* input);
> 
> Why is being able to change the source input important? I'd think you should just create a new `inputStream`. This would then let us make the `_input` field a reference instead.

There may be a better way to design the way streams and inputs connect with each other, but removing `set_input` is a step in the wrong direction.  The reason is that a stream is designed to be stack-allocated, while an input source is something that can have a non-stack lifetime.

It is supposed to be cheap to create an input stream on stack, and part of the cost model comes from the small buffer, which is also stack allocated, and defers the need for heap allocation.  This means it could be a performance problem to “make a new one” as suggested above.  It’s better to allow the stack allocated one to modify its input source, if the input is either (a) multiplexed from different sources, or (b) determined AFTER the input stream is declared.

Yes, the normal use is to determine the input source first and then wrap the i-stream around it.  But if this fails, it is not the right answer to push the i-stream onto the heap; i-streams are strongly associated with the blocks of code that exercise them.  (In HotSpot, something named “stream” often has such a stack-bound workflow, and there is often something ELSE that is heap-allocated that contains the stream’s data source.)

So the right answer, when the input sources are “hard to manage”, is `set_input`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1575408457


More information about the hotspot-dev mailing list