RFR: JDK-8320890: [AIX] Find a better way to mimic dl handle equality [v3]

Thomas Stuefe stuefe at openjdk.org
Tue Dec 5 13:55:51 UTC 2023


On Tue, 5 Dec 2023 12:11:46 GMT, Joachim Kern <jkern at openjdk.org> wrote:

>> On AIX, repeated calls to dlopen referring to the same shared library may result in different, unique dl handles to be returned from libc. In that it differs from typical libc implementations that cache dl handles.
>> 
>> This causes problems in the JVM with code that assumes equality of handles. One such problem is in the JVMTI agent handler. That problem was fixed with a local fix to said handler ([JDK-8315706](https://bugs.openjdk.org/browse/JDK-8315706)). However, this fix causes follow-up problems since it assumes that the file name passed to `os::dll_load()` is the file that has been opened. It prevents efficient, os_aix.cpp-local workarounds for other AIX issues like the *.so/*.a duality. See [JDK-8320005](https://bugs.openjdk.org/browse/JDK-8320005). As such, it is a hack that causes other, more uglier hacks to follow (see discussion of https://github.com/openjdk/jdk/pull/16604).
>> 
>> We propose a different, cleaner way of handling this:
>> 
>> - Handle this entirely inside the AIX versions of os::dll_load and os::dll_unload.
>> - Cache dl handles; repeated opening of a library should return the cached handle.
>> - Increase handle-local ref counter on open, Decrease it on close
>> - Make sure calls to os::dll_load are matched to os::dll_unload (See [JDK-8320830](https://bugs.openjdk.org/browse/JDK-8320830)).
>> 
>> This way we mimic dl handle equality as it is implemented on other platforms, and this works for all callers of os::dll_load.
>
> Joachim Kern has updated the pull request incrementally with one additional commit since the last revision:
> 
>   encapsulate everything in os::Aix::dlopen

Excellent, this is how I have pictured a good solution. Very nice.

A number of remarks, but nothing fundamental.

src/hotspot/os/aix/os_aix.cpp line 1137:

> 1135:     if (ebuf != nullptr && ebuflen > 0) {
> 1136:       ::strncpy(ebuf, "dll_load: empty filename specified", ebuflen - 1);
> 1137:     }

Are there any cases where we don't hand in the error buffer? If so, I would just assert ebuf and ebuflen. No need for this kind of flexibility.

src/hotspot/os/aix/os_aix.cpp line 3051:

> 3049: 
> 3050: // Simulate the library search algorithm of dlopen() (in os::dll_load)
> 3051: int os::Aix::stat64x_via_LIBPATH(const char* path, struct stat64x* stat) {

- no need to export this, make it filescope static
- please return bool, with false = error
- please rename it to something like "search_file_in_LIBPATH"

src/hotspot/os/aix/os_aix.cpp line 3055:

> 3053:     return -1;
> 3054: 
> 3055:   char *path2 = strdup (path);

Please use os::strdup and os::free. If you really intent to use the plain libc versions, use `::strdup` and `::free` to make sure - and indicate to code readers - you use the global libc variants.

src/hotspot/os/aix/os_aix.cpp line 3059:

> 3057:   int idx = strlen(path2) - 1;
> 3058:   if (path2[idx] == ')') {
> 3059:     while (path2[idx] != '(' && idx > 0) idx--;

? Why not `strrchr()`?

src/hotspot/os/aix/os_aix.cpp line 3067:

> 3065:   if (path2[0] == '/' ||
> 3066:       (path2[0] == '.' && (path2[1] == '/' ||
> 3067:                           (path2[1] == '.' && path2[2] == '/')))) {

This complexity is not needed, nor is it sufficient, since it does not handle relative paths ("mydirectory/hallo.so")

https://www.ibm.com/docs/en/aix/7.1?topic=d-dlopen-subroutine

"If FilePath contains a slash character, FilePath is used directly, and no directories are searched. "

So, just scan for a '/' - if you find one, its a path to be opened directly:


const bool use_as_filepath = strchr(path2, '/');

src/hotspot/os/aix/os_aix.cpp line 3085:

> 3083:   strcpy(libpath, env);
> 3084:   for (token = strtok_r(libpath, ":", &saveptr); token != nullptr; token = strtok_r(nullptr, ":", &saveptr)) {
> 3085:     sprintf(combined, "%s/%s", token, path2);

You can save a lot of pain and manual labor by using `stringStream` here. 


stringStream combined;
combined.print("%s/%s", token, path2);
const char* combined_path_string = combined.base();

no need for manual allocation and byte counting.

src/hotspot/os/aix/os_aix.cpp line 3099:

> 3097: // filled by os::dll_load(). This way we mimic dl handle equality for a library
> 3098: // opened a second time, as it is implemented on other platforms.
> 3099: void* os::Aix::dlopen(const char* filename, int Flags) {

Does not need to be exported, nor does os::AIX::dlclose. Make file scope static. See my remarks in os_posix.cpp.

src/hotspot/os/aix/os_aix.cpp line 3103:

> 3101:   struct stat64x libstat;
> 3102: 
> 3103:   if (os::Aix::stat64x_via_LIBPATH(filename, &libstat)) {

Please return bool, not unix int -1, this hurts my brain :-)

src/hotspot/os/aix/os_aix.cpp line 3108:

> 3106:     if (result != nullptr) {
> 3107:       assert(false, "dll_load: Could not stat() file %s, but dlopen() worked; Have to improve stat()", filename);
> 3108:     }

Since this is just assert code, I'd wrap all this stuff in #ifdef ASSERT. No need for needless dlopens otherwise.

src/hotspot/os/aix/os_aix.cpp line 3125:

> 3123:     }
> 3124:     if (i == g_handletable_used) {
> 3125:       // library not still loaded. Check if there is space left in array

s/still/yet

src/hotspot/os/aix/os_aix.cpp line 3131:

> 3129:         pthread_mutex_unlock(&g_handletable_mutex);
> 3130:         assert(false, "max_handletable reached");
> 3131:         return nullptr;

Please, for the sake of release code, hand in an error buffer and fill it with something that makes sense, eg. "too many libraries loaded".
The assert is still okay, I guess, since we don't expect it to fire during tests; if it does fire, it may indicate a problem in our handle table logic or a wrong assumption about handle équality.

src/hotspot/os/aix/os_aix.cpp line 3133:

> 3131:         return nullptr;
> 3132:       }
> 3133:       // library not still loaded and still place in array, so load library

s/still/yet

src/hotspot/os/aix/os_aix.cpp line 3143:

> 3141:         g_handletable[i].devid = libstat.st_dev;
> 3142:         g_handletable[i].refcount = 1;
> 3143:       }

Error handling: on error, call dlerror and return error string inside the error buffer you should hand in. All other platforms do this too.

src/hotspot/os/aix/os_aix.cpp line 3150:

> 3148: }
> 3149: 
> 3150: int os::Aix::dlclose(void* lib) {

can we call lib something better, maybe "handle"?

src/hotspot/os/aix/os_aix.cpp line 3165:

> 3163:       // refcount == 0, so we have to ::dlclose() the lib
> 3164:       // and delete the entry from the array.
> 3165:       res = ::dlclose(lib);

Handle dlclose error. We expect it to work; if it doesn't, it indicates that something is wrong with the handle logic, e.g. an invalid or already closed handle had been handed in. So, assert.

src/hotspot/os/aix/os_aix.hpp line 185:

> 183:   // opened a second time, as it is implemented on other platforms.
> 184:   static void* dlopen(const char* filename, int Flags);
> 185:   static int dlclose(void* lib);

Remove; should not be exported.

src/hotspot/os/posix/os_posix.cpp line 735:

> 733:     l_path = "<not available>";
> 734:   }
> 735:   int res = AIX_ONLY(os::Aix)::dlclose(lib);

Lets do this cleaner, and in the style of hotspot coding elsewhere:

- introduce a new function "os::pd_dll_unload(handle, errorbuf, errbuflen)". Add it to os.hpp, but somewhere non-public. The implementations will live in os_aix.cpp, os_bsd.cpp and os_linux.cpp.
- make os::Aix::dlclose -> os::pd_dll_unload; the only difference is that you should fill in error buffer with either ::dlerror or, if you have errors in handle table, a text describing that error
- on all other posix platforms (os_bsd.cpp + os_linux.cpp), implement a minimal version of os::pd_dll_unload() that calls ::dlunload, and on error calls ::dlerror and copies the string into errbuf
- Here, call os::pd_dll_unload instead of ::dlclose/os::aix::dlclose
- change the JFR code below to not use ::dlerror but the string returned from the buffer

-------------

Changes requested by stuefe (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16920#pullrequestreview-1762354122
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415568694
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415577023
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415588986
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415577568
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415585089
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415594396
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415625152
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415595301
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415596399
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415597594
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415599081
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415601350
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415607920
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415612828
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415619700
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415625511
PR Review Comment: https://git.openjdk.org/jdk/pull/16920#discussion_r1415638462


More information about the serviceability-dev mailing list