RFR: 8261710: SA DSO objects have sizes that are too large
Chris Plummer
cjplummer at openjdk.java.net
Tue Feb 16 02:48:38 UTC 2021
On Tue, 16 Feb 2021 01:37:15 GMT, Yasumasa Suenaga <ysuenaga at openjdk.org> wrote:
>> These changes look good, but it would be nice to fix OSX too. Fortunately it should be easier. As part of the fix for [JDK-8247515](https://bugs.openjdk.java.net/browse/JDK-8247515), I added a `lib_info->memsz` field. I think it is correct and is what you need. I needed it in order to properly scan .dylibs for symbols without scanning too far. It seems to be working. Unfortunately we don't really have a good tests for this, but you could look at the before and after for ClhsdbPmap test to get an idea of if it is working. If you don't have an OSX machine to try, I can try out your changes for you.
>>
>> BTW, I have no idea if Windows is getting the size right, but I guess we'll just have to assume it is:
>>
>> env->CallVoidMethod(obj, addLoadObject_ID, strName, (jlong) params[u].Size,
>> (jlong) params[u].Base);
>
> @plummercj Thanks for the review!
>
>> These changes look good, but it would be nice to fix OSX too. Fortunately it should be easier. As part of the fix for [JDK-8247515](https://bugs.openjdk.java.net/browse/JDK-8247515), I added a `lib_info->memsz` field. I think it is correct and is what you need.
>
> AFAICS `lib_info->memsz` is not set to loaded size, it seems to relates to the offset of the symbol in the binary.
>
> https://github.com/openjdk/jdk/blob/34ae7aeb64db90dcb4d2f3d4acb16c714a32824f/src/jdk.hotspot.agent/macosx/native/libsaproc/libproc_impl.c#L261
> https://github.com/openjdk/jdk/blob/34ae7aeb64db90dcb4d2f3d4acb16c714a32824f/src/jdk.hotspot.agent/macosx/native/libsaproc/symtab.c#L185
>
> Can I believe `lib_info->memsz` for this purpose?
> I'm not familiar of Mach-O, and I don't have Mac, so I can't evaluate it.
>
>> I needed it in order to properly scan .dylibs for symbols without scanning too far. It seems to be working. Unfortunately we don't really have a good tests for this, but you could look at the before and after for ClhsdbPmap test to get an idea of if it is working. If you don't have an OSX machine to try, I can try out your changes for you.
>
> In process attach on Linux, we can test the change with /proc/<PID>/maps, but in other case, we might not test it.
> In coredump on Linux, it is difficult because we cannot get memory map from other source (excepts the core).
>
>> BTW, I have no idea if Windows is getting the size right, but I guess we'll just have to assume it is:
>>
>> ```
>> env->CallVoidMethod(obj, addLoadObject_ID, strName, (jlong) params[u].Size,
>> (jlong) params[u].Base);
>> ```
>
> According to [Microsoft Docs](https://docs.microsoft.com/ja-jp/windows-hardware/drivers/ddi/dbgeng/ns-dbgeng-_debug_module_parameters), it sounds good.
https://bugs.openjdk.java.net/browse/JDK-8250750 has some interesting details that I forgot about. Note this is referencing the state of the code before fixing JDK-8250750, which changed how `lib->memsz` is calculated.
> `lib->memsz` comes from the size of the `LC_SEGMENT_64` that the library was discovered in. However, the library can actually span multiple segments. In this case of the vtable address, the address was in the segment that follows the initial `LC_SEGMENT_64`. Because of this `lib->memsz` is too small, resulting in symbol lookups being restricted to addresses that are in the initial segment.
> The simplest approach to fixing this seems to be locating the largest offset found in the symbol table, round that up to a page boundary, and use it as `lib->memsz`. I've implemented this and it seems to be working.
So it's not perfect, but I think it's good enough for our needs and better than using the file size and ending up with a size that is too big. I think the main way that this could fail is if you use `findpc` on an address that is beyond the last symbol (if it is even possible for there to be memory mapped at an address after the last symbol). In this case `findpc` will just say it doesn't know where the address is rather than saying it is in the dso + some very large offset. For symbols in the dso, they will always be found since lib`->memsz` now covers all symbols.
Also, since I never did figure out how to determine the size of a mach-o symbol, it's possible that the last symbol extends beyond the page boundary. If that is the case and you use `findpc` on and address that is part of the symbol but beyond the page boundary, `findpc` won't find it and say it doesn't know what the address is. So again, not perfect, but this issue can only happen with the last symbol in the dso, and only if the symbol crosses a page boundary, and only if the searched for address is after the page boundary. If you search for the start of the symbol, you'll still find it.
Having said all that, I think maybe we can get the correct size by taking the address of the highest symbol and then looking up the `map_info` that it is in. Then it's just a matter of adding `map->vaddr + map->memsz`. But this assumes that there are no more maps for the dso after the one that has the last symbol, so maybe still not perfect. I'm not sure any of this extra work is worth it. When I dove into the mach-o support last year to fix some issues, it ended up being a huge rat hole that took way more of my time then I had expected. It was very badly broken, far worse then I first thought, and many things about mach-o core files were a mystery, and still remain so. For example, see https://bugs.openjdk.java.net/browse/JDK-8249779. We never did figure out how to create a link map.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2563
More information about the serviceability-dev
mailing list