RFR: 8330532: Improve line-oriented text parsing in HotSpot

John Rose john.r.rose at oracle.com
Mon Apr 22 23:49:23 UTC 2024


On 22 Apr 2024, at 13:50, Johan Sjölen wrote:

> On Mon, 22 Apr 2024 18:11:56 GMT, Coleen Phillimore <coleenp at openjdk.org> wrote:
>
>>> (This PR is an alternative to https://github.com/openjdk/jdk/pull/18669 with a better API for reading lines of text)
>>>
>>> HotSpot has a few cases where information is parsed from a file, or from a memory buffer, one line at a time. Example:
>>>
>>> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/cds/classListParser.cpp#L169
>>> - https://github.com/openjdk/jdk/blob/064628471b83616b4463baa78618d1b7a66d0c7c/src/hotspot/share/compiler/compilerOracle.cpp#L1059-L1066
>>>
>>> Common problems:
>>> - They use a fixed buffer for reading a line, so long (but valid) lines will cause errors.
>>> - There's ad-hoc code that deals with `FILE*` differently than from memory.
>>>
>>> This RFE implements a common utility, `inputStream`, for reading lines from different sources of input (see `FileInput` and `MemoryInput`). We fixed only `ClassListParser` and `CompilerOracle` in this RFE, but we can fix other readers in follow-up RFEs.
>>>
>>> The API allows other source of input to be implemented. For example, one could implement a `SocketInput` if there's a use case for it.
>>>
>>> In the future, `inputStream` can be extended (or encapsulated in a higher-level reader class) to read typed input tokens (for example, integers, strings, etc.)
>>>
>>> Credit:
>>> The `inputStream` class and friends are contributed by @rose00 . See https://mail.openjdk.org/pipermail/hotspot-dev/2024-April/087077.html .
>>>
>>> John's original version is in the draft PR https://github.com/openjdk/jdk/pull/18773. In order to minimize the size of this PR, I have kept only the functionalities for reading a line and a time. Other features, such as pushing back contents into the `inputStream`, could be added in follow-up PRs. (These removed features can be found in the commit history of this PR).
>>
>> src/hotspot/share/utilities/istream.hpp line 92:
>>
>>> 90:   // Do we need to read more input (NTR)?  Did we see EOF already?
>>> 91:   // Was there an error getting input or allocating buffer space?
>>> 92:   enum IState { NTR_STATE, EOF_STATE, ERR_STATE };
>>
>> Enum class is preferable.
>
> This is a nit, but if enum class is used can we then also change to PascalCase for the enum cases?

If you grep for ‘enum class’ in our code base, you find a number
of UPPER_CASE member names, plus lower_case and
_pre_hyphen_lower_case and Capitalized.  Not many PascalCase.
Am I missing a reason to prefer PascalCase here?

The style guide says:

> Constant names may be upper-case or mixed-case, according to
> historical necessity.  (Note: There are many examples of constants
> with lowercase names.)

I chose UPPER_CASE because the enum members function like
global constants, and this is low-level code.  I guess my
precedent was JavaThreadStatus, which has members named
like that.


More information about the hotspot-dev mailing list