RFR: 8330532: Improve line-oriented text parsing in HotSpot [v4]

Johan Sjölen jsjolen at openjdk.org
Thu May 2 10:32:57 UTC 2024


On Wed, 1 May 2024 20:07:07 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text)
>> 
>> HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example:
>> 
>> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169
>> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066
>> 
>> Common problems:
>> - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors.
>> - There's ad-hoc code that deals with `FILE*` differently than from memory.
>> 
>> This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs.
>> 
>> The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it.
>> 
>> In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.)
>> 
>> Credit:
>> The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html .
>> 
>> John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR).
>
> Ioi Lam has updated the pull request incrementally with three additional commits since the last revision:
> 
>  - BlockInputStream is used by gtest only, so moved it there
>  - removed unused set_position(), etc
>  - removed _must_free

Hi Ioi, thanks for looking at my changes. I've got some more changes :-). Mainly: Shouldn't the `_small_buffer` be assigned to the `_buffer` pointer by default? This simplifies the code a bit.

src/hotspot/share/utilities/istream.cpp line 119:

> 117:   assert(!definitely_done(), "");  // caller responsibility
> 118:   while (need_to_read()) {
> 119:     size_t fill_offset, fill_length;

Nit: It's fine to move these decls out of the loop.

src/hotspot/share/utilities/istream.cpp line 173:

> 171:     assert(_buffer_size > 0, "");
> 172:     // and continue with at least a little buffer
> 173:   }

Get rid of this branch, small buffer now default.

src/hotspot/share/utilities/istream.cpp line 273:

> 271:     COV(EXB_S);
> 272:     new_buf = &_small_buffer[0];
> 273:     new_length = sizeof(_small_buffer);

Remove this branch, cannot ever happen as default is the small buffer.

src/hotspot/share/utilities/istream.cpp line 378:

> 376:   return old_mode;
> 377: }
> 378: #endif //ASSERT

I believe that we have support for `gcov` (grepping for it shows results for a `GCOV_ENABLED` flag), in which case that's what we should use for code coverage instrumentation. So, this should probably be deleted (along with the rest of it).

It would be best if we got some docs for how to use gcov also, however.

src/hotspot/share/utilities/istream.hpp line 111:

> 109: 
> 110:   bool has_c_heap_buffer() {
> 111:     return _buffer != nullptr && _buffer != &_small_buffer[0];

No need to check for nullptr.

src/hotspot/share/utilities/istream.hpp line 196:

> 194:   bool fill_buffer();
> 195: 
> 196:   // Find some room in the buffer so we call read on it.

"so we **can** call read on it" probably

src/hotspot/share/utilities/istream.hpp line 236:

> 234:     _end(0),
> 235:     _next(0),
> 236:     _line_count(0) {}

Explicitly initialize the `_small_buffer` (`0` it out?). Set `_buffer` and `_buffer_size` to the small buffer by default.

src/hotspot/share/utilities/istream.hpp line 247:

> 245:   virtual ~inputStream() {
> 246:     if (has_c_heap_buffer()) free_c_heap_buffer();
> 247:     if (_input != nullptr)  set_input(nullptr);

In case we remove close: remove this call.

-------------

PR Review: https://git.openjdk.org/jdk/pull/18833#pullrequestreview-2035313060
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587380892
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587387927
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587390089
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587367132
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587395999
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587379361
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587385232
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1587397065


More information about the hotspot-dev mailing list