RFR: 8330532: Improve line-oriented text parsing in HotSpot
Johan Sjölen
jsjolen at openjdk.org
Mon Apr 22 21:45:30 UTC 2024
On Thu, 18 Apr 2024 03:51:06 GMT, Ioi Lam <iklam at openjdk.org> wrote:
> (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text)
>
> HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example:
>
> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169
> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066
>
> Common problems:
> - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors.
> - There's ad-hoc code that deals with `FILE*` differently than from memory.
>
> This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs.
>
> The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it.
>
> In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.)
>
> Credit:
> The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html .
>
> John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR).
Hi,
The general concept of an `inputStream` is very good and I think this is useful, so I'm very happy that this is being added to Hotspot.
There's a design issue that I would like to be fixed, however. Consider the interface that `inputStream::Input` exposes, specifically the `close` operation:
```c++
// Rewind so that the position appears to be the given one.
// Return the new position, or else (size_t)-1 if the request fails.
virtual size_t set_position(size_t position) { return -1; }
// If it is backed by a resource that needs closing, do so.
virtual void close() { }
This function:
```c++
void inputStream::set_input(inputStream::Input* input) {
clear_buffer();
if (_input != nullptr && _input != input) {
_input->close();
}
_input = input;
_input_state = NTR_STATE;
}
and finally this:
```c++
// class inputStream
Input* _input; // where the input comes from or else nullptr
I'd like to see `set_input` be removed, as instead of changing the inputStream a new can be created (and replace the old one, if need be). Second, I'd like to see the `close` method to be removed as `inputStream` no longer has any code calling it. Third, I'd like to see the `_input` field to go from being a pointer to being a reference as there is no point in an `inputStream` not having an input, and that input should not change.
Closing the resource is the responsibility of the owner, and this preferably happens in its destructor. If an explicit close method is required, then the owner will know what that method is, as the input object will be "concrete" (such as `FileInput`). In `MemoryInput` `close()`:ing the resource ought to be the same as `free()`:ing it, but here that is ignored and `close()` seems unimplemented?
Consider the following code, I'd be very surprised at this:
```c++
{
FileInput fi("myfile");
{
inputStream(&fi);
// ... Let's say I figure out that the file has at least 8 bytes
}
char bytearr[8];
fi.set_position(0);
int read_bytes = fi.read(&bytearr, 8);
// read_bytes is 0 !??!
}
More generally, the class `inputStream::Input` should exist to satisfy the needs of `inputStream` and **not** as a general interface for any and all inputs. Any optional method cannot be depended upon anyway, so two paths will have to be written if any function takes an `inputStream::Input&` as input. I'd like to call upon "YAGNI" and "KISS" here, as I believe we're over complicating it without any actual need for the complexity.
src/hotspot/share/utilities/istream.hpp line 150:
> 148: assert(_buffer_size == 0 || _next <= _buffer_size, "");
> 149: return true;
> 150: }
Please add message, even if only "invariant".
src/hotspot/share/utilities/istream.hpp line 207:
> 205: const_cast<inputStream*>(this)->fill_buffer();
> 206: }
> 207: }
Why `const_cast` and assign this method `const` when it clearly is not? This shouldn't be `const`, is my point.
src/hotspot/share/utilities/istream.hpp line 249:
> 247:
> 248: // Discards any previous input and sets the given input source.
> 249: void set_input(Input* input);
Why is being able to change the source input important? I'd think you should just create a new `inputStream`. This would then let us make the `_input` field a reference instead.
-------------
PR Review: https://git.openjdk.org/jdk/pull/18833#pullrequestreview-2015784011
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1575345973
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1575353275
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1575350511
More information about the hotspot-dev
mailing list