RFR: 8313306: More sensible reservation logging
Thomas Stuefe
stuefe at openjdk.org
Tue Jan 16 07:00:24 UTC 2024
On Fri, 12 Jan 2024 06:30:30 GMT, David Holmes <dholmes at openjdk.org> wrote:
>> This PR implements more specialized logs for virtual memory APIs in the “os” namespace. It uses “os” and “map” as log tags using unified JVM logging as introduced in JEP 158 (https://openjdk.org/jeps/158).
>>
>> As far as testing is concerned, I have added a regression test to verify the logging is working accordingly.
>
> src/hotspot/share/runtime/os.cpp line 1817:
>
>> 1815: log_debug(os,map)("Reserved [" INTPTR_FORMAT " - " INTPTR_FORMAT"] (%zu bytes).", p2i(result), p2i(result + bytes), bytes);
>> 1816: } else {
>> 1817: log_info(os,map)("Reserve failed (%zu bytes), errno %d %s", bytes, get_last_error(), strerror(get_last_error()));
>
> This looks suspicious. First, it isn't clear that `get_last_error()` will actually return the error from `pd_reserve_memory` as you can't tell what might have intervened. Second, the second call may well return a different value to the first, again depending on how this all gets expanded and evaluated. Third `strerror` is only for converting `errno` to a string and on Windows `get_last_error()` may not be returning `errno` but a Windows specific error code.
Hi @dholmes-ora!
I advised Sonia about this, and we were aware of and torn about these issues. They are tricky ones. There are solvable, but all solutions feel a bit invasive for a simple logging change.
> First, it isn't clear that get_last_error() will actually return the error from pd_reserve_memory as you can't tell what might have intervened.
You are right. Thinking this through some more, I think the best would be two forms of logging.
1) a high level log, (e.g. "debug" level), in the generic os:: function. No error details, just "failed" and "succeeded".
2) a low level log (e.g. "trace" level), within the specific os function, close to the failing system call. Here, we could print the error detail too, either errno or GetLastError. For aggregate functions that do multiple system calls, we would log each one. Or, better, only those that we are reasonably interested in and that have a reasonable chance of failing.
That would be more informative for the log reader and also semantically more correct since we would print the errno together with the specific system call that failed.
The counterargument for (2) is that logging at the system call level could be seen as excessive and could be done with tools like strace or SysInternals Tracer. However, we have many examples of system-level logs already.
...
> Third strerror is only for converting errno to a string and on Windows get_last_error() may not be returning errno but a Windows specific error code.
True also. This is a long-standing issue. If we follow my proposal above, this issue does not matter because the OS layer takes care of printing error details. But otherwise, I'd opt for something like an OSError class that encapsulates error information and abstracts away platform details (errno vs GetLastError, strerror vs FormatErrorMessage etc).
What do you think?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/17356#discussion_r1452995991
More information about the hotspot-runtime-dev
mailing list