RFR: 8330532: Improve line-oriented text parsing in HotSpot [v3]

Johan Sjölen jsjolen at openjdk.org
Tue Apr 23 10:23:30 UTC 2024


On Tue, 23 Apr 2024 01:08:05 GMT, Ioi Lam <iklam at openjdk.org> wrote:

>> (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text)
>> 
>> HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example:
>> 
>> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169
>> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066
>> 
>> Common problems:
>> - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors.
>> - There's ad-hoc code that deals with `FILE*` differently than from memory.
>> 
>> This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs.
>> 
>> The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it.
>> 
>> In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.)
>> 
>> Credit:
>> The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html .
>> 
>> John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR).
>
> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision:
> 
>  - Merge branch 'master' of https://github.com/openjdk/jdk into 8330532-improve-line-oriented-text-parsing-in-hotspot
>  - Comments fro @coleenp and @matias9927
>  - removed more unused code from istream.hpp
>  - Merged ClassFileParser changes from https://github.com/openjdk/jdk/pull/18669
>  - Removed gtest cases for features removed in the previous commit
>  - Reverted xmlstream.cpp/hpp and removed unused functions from inputStream
>  - fixed builds
>  - Imported @jrose00 changes https://github.com/openjdk/jdk/pull/18773

A couple of more comments.

And I implemented the simplification I suggested like this: https://github.com/jdksjolen/jdk/commit/b5bc0e945cb96c0a73a91981b937968e5cbac33c

Also, I use `override` instead of `virtual`, then we get compiler help if we for some reason (non-matching arg lists for example) don't actually override a super class's virtual function.

src/hotspot/share/utilities/istream.hpp line 98:

> 96: 
> 97:   Input* _input;   // where the input comes from or else nullptr
> 98:   IState _input_state;  // one of {NTR,EOF,ERR}_STATE

This comment not necessary as the type describes its valid state.

src/hotspot/share/utilities/istream.hpp line 106:

> 104:   size_t _end;          // offset to end of known current line (else content_end)
> 105:   size_t _next;         // offset to known start of next line (else =end)
> 106:   void*  _must_free;    // unless null, a malloc pointer which we must free

Reading this code, why do we set `_must_free` instead of simply having a method:

```c++
bool must_free() {
  return _buffer != &_small_buffer;
}


and just delete the `_must_free` field.

-------------

PR Review: https://git.openjdk.org/jdk/pull/18833#pullrequestreview-2016820941
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1576001292
PR Review Comment: https://git.openjdk.org/jdk/pull/18833#discussion_r1576011247


More information about the hotspot-dev mailing list