RFR: 8203290: [PPC64, s390] Check functionality of JDK-8199712 (Flight Recorder) [v16]

Thomas Stuefe stuefe at openjdk.java.net
Sat Jan 29 07:02:16 UTC 2022


On Fri, 28 Jan 2022 15:07:56 GMT, Tyler Steele <duke at openjdk.java.net> wrote:

>> Just in time for the holidays I have completed an implementation of the JFR functionality for AIX. As a side note, this is my first submission to OpenJDK 👋
>> 
>> ### Implementation notes and alternatives considered
>> 
>> After modifying the build system to allow the --enable-jvm-feature-jfr to work on AIX, my task was to implement the interfaces from os_perf.hpp. The os_perf_aix.cpp implementation that existed was, I believe, a copy of the Linux implementation. A review of the code in that file showed that NetworkInterface, CPUPerformanceInterface, and SystemProcessInterface would require modification to work on AIX. Using the Linux implementation as a guide, I initially expected to use files from the aix procfs like /proc/<pid>/psinfo, and /proc/<pid>/status in a manner similar to the Linux implementation. However, I ended up using libperfstat for all functionality required by the interfaces.
>> 
>> ### Testing
>> 
>> Testing for JFR seems to be quite light in the repo both before and after this change. In addition to performing manual testing, I have added some basic sanity checks that ensure events can be created and committed (using jtreg), and performs some basic checks on the results of the interface member functions (using gtest).
>> 
>> ### More notes
>> 
>> I've sent an email to the JFR group about a TOC overflow warning I encountered while building (for the release server target). I believe the fix is to pass -qpic=large when using the xlc toolchain, but my modifications to flags-cflags.m4 have not been successful in removing this warning.
>
> Tyler Steele has updated the pull request incrementally with two additional commits since the last revision:
> 
>  - Changes macoss -> macosx in problem list
>  - Refactors loadlib_aix: Removes redundant c++ class

Hi Tyler,

looks better. See comments inline.

src/hotspot/os/aix/loadlib_aix.cpp line 116:

> 114: //   entry_t* next;
> 115: //   loaded_module_t info;
> 116: // };

please remove

src/hotspot/os/aix/loadlib_aix.cpp line 326:

> 324: // Helper for copying loaded module info to external callers.
> 325: // To avoid dangling pointers 'next' is set to null
> 326: static bool copy_lm_to_external(const loaded_module_t* const from, loaded_module_t* to) {

So, just to understand, the point of this function is to handle to==NULL (from is never NULL) and to set next to NULL? 

I'd probably keep doing this inline in the two places where this happens, but I leave it up to you. If you think this is cleaner, its fine too.

src/hotspot/os/aix/loadlib_aix.hpp line 37:

> 35: 
> 36: #include "misc_aix.hpp"
> 37: #include "runtime/os.hpp"

Bikeshed: I would like to have avoided including os.hpp (its a monster) - hence my recommendation about a separate callback or closure definition. But it is not super important, this is fine too.

src/hotspot/os/aix/os_perf_aix.cpp line 72:

> 70: 
> 71: static bool initialize_libperfstat() {
> 72:   static bool is_libperfstat_loaded = false;

I don't think this is needed. libperfstat is initialized as part of os::init in os_aix.cpp.

https://github.com/openjdk/jdk/blob/d366d15d67a08833d93a5806edef8145cb7803e5/src/hotspot/os/aix/os_aix.cpp#L2999-L3008
https://github.com/openjdk/jdk/blob/d366d15d67a08833d93a5806edef8145cb7803e5/src/hotspot/os/aix/os_aix.cpp#L2329-L2333

Should be already active at this point. Its functions are stubbed out in case the library cannot be loaded, so you should be able to call them in here, without an additional init check. You just have to handle the errors on the individual API calls, as if the real APIs returned errors. But you do this already I think.

Long term, if you think libperfstat should be always there, we can simplify things and treat libperfstat load errors as hard VM errors in release builds too. We already assert in `os::Aix::initialize_libperfstat()`. (Note: "assert" fires in debug builds (ASSERT macro), and "guarantee" fires in all builds).

src/hotspot/os/aix/os_perf_aix.cpp line 495:

> 493:     char* name     = NEW_C_HEAP_ARRAY(char, IDENTIFIER_LENGTH, mtInternal);
> 494:     char* exe_name = NEW_C_HEAP_ARRAY(char, PRFNSZ, mtInternal);
> 495:     char* cmd_line = NEW_C_HEAP_ARRAY(char, PRARGSZ, mtInternal);

So the contract with SystemProcess is that its ctor arguments need to be C-heap allocated, and releases them itself? ... okay. May be worthwhile to clean up at some point, or at least comment.

src/hotspot/os/aix/os_perf_aix.cpp line 611:

> 609: 
> 610:   // calling perfstat_<subsystem>(NULL, NULL, _, 0) returns number of available records
> 611:   if ((n_records = libperfstat::perfstat_netinterface(NULL, NULL, sizeof(perfstat_netinterface_t), 0)) <= 0) {

is n=0 an error? Could there conceivably be zero interfaces?

src/hotspot/os/aix/os_perf_aix.cpp line 631:

> 629:                                                            net_stats[i].ibytes,
> 630:                                                            net_stats[i].obytes,
> 631:                                                            *network_interfaces);

When looking at NetworkInterface, I saw that it copies the name into its own resource-area allocated string in the constructor. This is all over the place, SystemProcess assumes C-heap storage, NetworkInterface uses RA... :(

The problem now is that your ResourceMark above may also destroy the internal string in NetworkInterface when the Stack is unwound beyond this function.

Side note: this is what I don't like about how hotspot uses RA. Its often not clear which data are RA allocated without looking at the code. In Objective-C (at least how its used on Mac) they had a clear naming convention for methods that returned storage from the AutoReleasePool.

The problem is also that with ResourceMarks cutting off your data, you may not see the problem until much later. Usually you only see immediate problems if the underlying arena (a chained list of malloced slabs) added a new slab under your ResourceMark, which then gets free'd when the ResourceMark goes out of scope.

How to fix this:
1) revert to NEW_C_HEAP_ARRAY for your records array here, with appropriate cleanup, and remove ResourceMark
3) Change NetworkInterface class to use C-heap instead.

I'd opt for (1) since it avoids discussions about changing shared code. I opened https://bugs.openjdk.java.net/browse/JDK-8280912 to change NetworkInterface, but don't wait for me.

-------------

Changes requested by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6885



More information about the build-dev mailing list