RFR: JDK-8260030: Improve stringStream buffer handling
Thomas Stuefe
stuefe at openjdk.java.net
Thu Jan 21 17:12:32 UTC 2021
On Thu, 21 Jan 2021 13:47:47 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
>> stringStream objects are used as temporary string buffers a lot in hotspot. When investigating JDK-8259710, I found that a large majority of the stringStreams created never use the backing buffer fully; about 70% of all streams write less than 32 characters.
>>
>> stringStream creates an backing buffer, C-heap allocated, with a default size of 256 characters. Some things could be improved:
>>
>> - add a small in-object buffer for the first n characters to be written. This would avoid or at least delay allocation of the backing buffer from C-heap, at the expense of slightly increased object size and one memcpy when switching over to C-heap. Note that if the backing buffer is smaller than what now is the default size, the total footprint will go down despite the increased object size.
>>
>> - Delaying allocation of the backing buffer would be good for the many cases where stringStream objects are created as temporary objects without being actually used, see eg. compile.hpp `class PrintInliningBuffer`
>>
>> - Besides all that, the code could benefit from a little grooming (using modern style syntax etc).
>>
>> ----
>>
>> This patch:
>>
>> - renames members of stringStream to conform to common syntax (eg leading underscores) and to be clearer
>> - Uses initialization lists
>> - Factors out buffer growth code from stringStream::write() to a new function stringStream::grow()
>> - Introduces a new object-internal mini buffer, `stringStream::_small_buffer`, sized 32 bytes, which is initially used for all writes
>>
>> This patch drastically reduces the number of malloc calls done from this class. The internal buffer size of 32byte seems a good cut-off. Running some unrelated test program (no tracing active), I see a reduction in the number of malloc calls from stringStream from ~211K malloc calls down to 53K (debug VM). In a release VM, it drops from ~85K down to about 1K. The reason is that `stringStream` is used in a ton of places as a temporary stack allocated object, to write very minimal text snippets to.
>>
>> I also tweaked the associated gtest to test more thoroughly.
>
> src/hotspot/share/utilities/ostream.cpp line 341:
>
>> 339: assert(new_capacity > _capacity, "Sanity");
>> 340: assert(new_capacity > sizeof(_small_buffer), "Sanity");
>> 341: assert(!_is_fixed, "Don't call for caller provided buffers");
>
> I think the !_is_fixed assert should be first.
Will do.
> src/hotspot/share/utilities/ostream.cpp line 357:
>
>> 355: void stringStream::write(const char* s, size_t len) {
>> 356: size_t write_len = len; // number of non-null bytes to write
>> 357: size_t end = _written + len + 1; // position after write and final '\0'
>
> There's a risk of arithmetic overflow here, if len is "bad". Can this be
> rewritten to be more careful? Though maybe an assert that end <= _written
> is sufficient, as line 369 already protects against buffer overrun.
> Also, the comments aren't lined up.
I'll rewrite this coding to be safer.
> src/hotspot/share/utilities/ostream.cpp line 407:
>
>> 405:
>> 406: stringStream::~stringStream() {
>> 407: if (_is_fixed == false && _buffer != NULL && _buffer != _small_buffer) {
>
> `_is_fixed == false` => `!_is_fixed`
> Is `_buffer == NULL` actually possible now?
No, you are right. I'll remove it. free would accept NULL anyway, so there is not even a risk.
> src/hotspot/share/utilities/ostream.hpp line 196:
>
>> 194: class stringStream : public outputStream {
>> 195: protected:
>> 196: char* _buffer;
>
> Consider changing the protected members to private. There are no
> derived classes, and probably shouldn't be without more work.
Very good, I'll do it.
> src/hotspot/share/utilities/ostream.cpp line 389:
>
>> 387:
>> 388: void stringStream::reset() {
>> 389: _written = 0; _precount = 0; _position = 0;
>
> Is it a pre-existing bug that this doesn't also have `_newlines = 0;` ?
Strictly speaking yes. But _newlines is really only used as a very circumvent way to trigger a flush in defaultStream::write() if the last write contained a newline. We could probably throw _newlines completely away at the cost of a `strchr(s, '\n')` in defaultStream::write(). I'll remove the outside accessor `newlines()` at least, since its not used, and will correct stringStream::reset(), so it is consistent for now. I leave the cleanup for another day.
> src/hotspot/share/utilities/ostream.cpp line 396:
>
>> 394: char* copy = c_heap ?
>> 395: NEW_C_HEAP_ARRAY(char, _written + 1, mtInternal) : NEW_RESOURCE_ARRAY(char, _written + 1);
>> 396: strncpy(copy, _buffer, _written);
>
> Pre-existing: Should this really be using strncpy rather than memcpy?
> strncpy will stop at an embedded NUL.
Sure, I'll switch it to memcpy. Its probably more efficient. The content should not contain embedded zeros, but since we do not test on write, they can slip in. Using memcpy would be at least consistent with write().
-------------
PR: https://git.openjdk.java.net/jdk/pull/2160
More information about the hotspot-dev
mailing list