RFR: 8250826: jhsdb does not work with coredump which comes from Substrate VM
Yasumasa Suenaga
suenaga at oss.nttdata.com
Wed Aug 5 00:06:54 UTC 2020
On 2020/08/05 4:15, serguei.spitsyn at oracle.com wrote:
> On 8/4/20 11:04, Chris Plummer wrote:
>> On 8/4/20 2:11 AM, serguei.spitsyn at oracle.com wrote:
>>> Hi Yasumasa,
>>>
>>> The fix looks good to me.
>>> Thanks to Chris for discussing the details in previous emails.
>>>
>>> Just one suggestion:
>>>
>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8250826/webrev.02/src/jdk.hotspot.agent/share/native/libsaproc/ps_core_common.c.frames.html
>>> 44 #ifdef PF_R
>>> 45 #define MAP_R_FLAG PF_R
>>> 46 #else
>>> 47 #define MAP_R_FLAG 0
>>> 48 #endif
>>> Could you, please add a small comment before?
>>> Something like this would be enough, I think:
>>> // Define a segment permission flag allowing read.
>>>
>> That comment is a little bit misleading, since on OSX the flag really has no meaning. Maybe the following would be better:
>>
>> // Define a segment permission flag allowing read if there is a read flag. Otherwise use 0.
>
> Thanks, this is more precise.
> In fact, I was more concerned to explain what the macro PF_R is about.
Ok, I will push the change with this comment.
Thanks,
Yasumasa
> Thanks,
> Serguei
>
>>
>> thanks,
>>
>> Chris
>>> Thanks,
>>> Serguei
>>>
>>>
>>> On 8/3/20 17:54, Yasumasa Suenaga wrote:
>>>> Thanks Chris!
>>>> I will push it when I got second reviewer.
>>>>
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>> On 2020/08/04 9:54, Chris Plummer wrote:
>>>>> Hi Yasumasa,
>>>>>
>>>>> Your changes look good now.
>>>>>
>>>>> thanks,
>>>>>
>>>>> Chris
>>>>>
>>>>> On 8/3/20 5:47 PM, Yasumasa Suenaga wrote:
>>>>>> Hi Chris,
>>>>>>
>>>>>> Thank you for the comment!
>>>>>> I updated webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8250826/webrev.02/
>>>>>> Diff from webrev.01: http://hg.openjdk.java.net/jdk/submit/rev/e98dc25b69c2
>>>>>>
>>>>>> On 2020/08/04 6:41, Chris Plummer wrote:
>>>>>>> Hi Yasumasa,
>>>>>>>
>>>>>>> Your updated fix resulted in using the core file map whereas the original fix used the library map. In both cases the assert is avoided, which I think is the main goal. Does it matter which map is used?
>>>>>>
>>>>>> In GraalVM, read only segment is conflicted, thus it does not matter which map is used.
>>>>>> However this webrev is more generalize, so segments in coredump should be used.
>>>>>>
>>>>>>> 42 #ifndef PF_R
>>>>>>> 43 #define PF_R 0x4
>>>>>>> 44 #endif
>>>>>>>
>>>>>>> 156 if ((map = allocate_init_map(ph->core->classes_jsa_fd,
>>>>>>> 157 offset, vaddr, memsz, PF_R)) == NULL) {
>>>>>>>
>>>>>>> I'm not so sure this is appropriate for OSX. It uses mach-o files, not elf files. The segment_command flags field comes from loader.h [1]. I don't see anything in there that looks like the equivalent of ELF access flags.
>>>>>>>
>>>>>>> /* Constants for the flags field of the segment_command */
>>>>>>> #define SG_HIGHVM 0x1 /* the file contents for this segment is for
>>>>>>> the high part of the VM space, the low part
>>>>>>> is zero filled (for stacks in core files) */
>>>>>>> #define SG_FVMLIB 0x2 /* this segment is the VM that is allocated by
>>>>>>> a fixed VM library, for overlap checking in
>>>>>>> the link editor */
>>>>>>> #define SG_NORELOC 0x4 /* this segment has nothing that was relocated
>>>>>>> in it and nothing relocated to it, that is
>>>>>>> it maybe safely replaced without relocation*/
>>>>>>> #define SG_PROTECTED_VERSION_1 0x8 /* This segment is protected. If the
>>>>>>> segment starts at file offset 0, the
>>>>>>> first page of the segment is not
>>>>>>> protected. All other pages of the
>>>>>>> segment are protected. */
>>>>>>>
>>>>>>> Since the flags don't matter for OSX, maybe you should just pass 0. You can do something like:
>>>>>>>
>>>>>>> #ifndef PF_R
>>>>>>> #define MAP_R_FLAG PF_R
>>>>>>> #else
>>>>>>> #define MAP_R_FLAG 0
>>>>>>> #endif
>>>>>>
>>>>>> Thanks!
>>>>>> I thought PF_R can be used PF_R from elf.h on macOS:
>>>>>> https://opensource.apple.com/source/dtrace/dtrace-90/sys/elf.h
>>>>>>
>>>>>> I merged your code in this webrev.
>>>>>>
>>>>>>> Some minor comment fixes are needed:
>>>>>>>
>>>>>>> 397 // Access flags fot this memory region is different between the library
>>>>>>>
>>>>>>> "fot" -> "for"
>>>>>>> "is" -> "are"
>>>>>>>
>>>>>>> 399 // We should respect to coredump.
>>>>>>>
>>>>>>> "to" -> "the"
>>>>>>>
>>>>>>> 404 // And head of ELF header might be included in coredump (See JDK-7133122).
>>>>>>> 405 // Thus we need to replace PT_LOAD segments the library version.
>>>>>>>
>>>>>>> How about:
>>>>>>>
>>>>>>> 404 // Also the first page of the ELF header might be included in the coredump (See JDK-7133122).
>>>>>>> 405 // Thus we need to replace the PT_LOAD segment with the library version.
>>>>>>
>>>>>> Fixed them.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> [1] https://opensource.apple.com/source/xnu/xnu-1456.1.26/EXTERNAL_HEADERS/mach-o/loader.h.auto.html
>>>>>>>
>>>>>>> On 8/2/20 12:18 AM, Yasumasa Suenaga wrote:
>>>>>>>> Hi Chris,
>>>>>>>>
>>>>>>>> (Remove "trivial" from subject)
>>>>>>>>
>>>>>>>> Thanks for the information! I fixed errors in new webrev. It passed tests on submit repo (mach5-one-ysuenaga-JDK-8250826-1-20200802-0151-13109525)
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8250826/webrev.01/
>>>>>>>>
>>>>>>>>
>>>>>>>> I tried to use elf.h instead of #define for PF_R, however it failed (mach5-one-ysuenaga-JDK-8250826-1-20200802-0542-13111335).
>>>>>>>>
>>>>>>>> http://hg.openjdk.java.net/jdk/submit/rev/67baee1a1a1d
>>>>>>>>
>>>>>>>> Thus I added #define for it in this webrev.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2020/08/02 10:22, Chris Plummer wrote:
>>>>>>>>> Hi Yasumasa,
>>>>>>>>>
>>>>>>>>> [2020-08-01T14:15:42,514Z] Creating support/native/jdk.hotspot.agent/libsaproc/static/libsaproc.a from 8 file(s)
>>>>>>>>> [2020-08-01T14:15:43,961Z] ./open/src/jdk.hotspot.agent/share/native/libsaproc/ps_core_common.c:128:8: error: no member named 'flags' in 'struct map_info'
>>>>>>>>> [2020-08-01T14:15:43,961Z] map->flags = flags;
>>>>>>>>> [2020-08-01T14:15:43,961Z] ~~~ ^
>>>>>>>>> [2020-08-01T14:15:43,963Z] ./open/src/jdk.hotspot.agent/share/native/libsaproc/ps_core_common.c:153:54: error: use of undeclared identifier 'PF_R'
>>>>>>>>> [2020-08-01T14:15:43,963Z] offset, vaddr, memsz, PF_R)) == NULL) {
>>>>>>>>>
>>>>>>>>> I'll look at the code changes later. No time at the moment.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>> 2020-08-01-1405571.suenaga.source2020-08-01-1405571.suenaga.source 2020-08-01-1405571.suenaga.source On 8/1/20 5:20 PM, Yasumasa Suenaga wrote:
>>>>>>>>>> Hi Chris,
>>>>>>>>>>
>>>>>>>>>> Thanks for your comment!
>>>>>>>>>> I pushed new change to submit repo, but the build failed on macOS. Could you share details?
>>>>>>>>>> (I do not have Mac)
>>>>>>>>>>
>>>>>>>>>> commit: http://hg.openjdk.java.net/jdk/submit/rev/0eb1c497f297
>>>>>>>>>> job: mach5-one-ysuenaga-JDK-8250826-1-20200801-1407-13098989
>>>>>>>>>>
>>>>>>>>>> On 2020/08/01 13:06, Chris Plummer wrote:
>>>>>>>>>>> On 7/30/20 6:18 PM, Yasumasa Suenaga wrote:
>>>>>>>>>>>> Hi Chris,
>>>>>>>>>>>>
>>>>>>>>>>>> On 2020/07/31 7:29, Chris Plummer wrote:
>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>
>>>>>>>>>>>>> If I understand correctly we first call add_map_info() for all the PT_LOAD segments in the core file. We then process all the library segments, calling add_map_info() for them if the target_vaddr has not already been addded. If has already been added, which I assume is the case for any library segment that is already in the core file, then the core file version is replaced the the library version. I'm a little unclear of the purpose of this replacing of the core PT_LOAD segments with those found in the libraries. If you could explain this that would help me understand your change.
>>>>>>>>>>>>
>>>>>>>>>>>> Read only segments in ELF should not be any different from PT_LOAD segments in the core.
>>>>>>>>>>>> And head of ELF header might be included in coredump (See JDK-7133122). Thus we need to replace PT_LOAD segments the library version.
>>>>>>>>>>> Ok. The code in the area really should have been commented better when first written. The purpose is not understandable simply by reading the code.
>>>>>>>>>>
>>>>>>>>>> I added some comments to existing code. Please tell me if it is insufficient.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> I'm also unsure why existing_map->fd would ever be something other than the core file. Why would another library map the same target_vaddr.
>>>>>>>>>>>>
>>>>>>>>>>>> When mmap() is called to read-only ELF segments / sections, Linux kernel seems to allocate other memory segments which has same top virtual memory address. I've not yet found out from the code of Linux kernel, but I confirmed this behavior on GDB.
>>>>>>>>>>> Ok. Same comment as above. This should have been explained with comments in the code.
>>>>>>>>>>
>>>>>>>>>> Added some comments.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> As for your fix, if I understand correctly the issue is that a single segment in the library is being split into two segments in the process (and therefore in the core file) due to an mprotect being done on part of the segment. Because of this the segment size in the library does match the segment size in the core file. So with your fix the library segment is used, but what about the other half of the segment that is in the core file? Don't we now have overlapping segments; the full original segment from the library, and then a second segment that overlaps the tail end of the library segment? Will that cause any confusion later on?
>>>>>>>>>>
>>>>>>>>>> As long as vaddr is valid, it doesn't matter even if it overlaps because SA would sort the map with vaddr, and would lookup with it.
>>>>>>>>>> In Substrate VM, there are RO and RW sections in that order, so it is ok with webrev.00 . However it might not be appropriate because RW section might be top of PT_LOAD.
>>>>>>>>>>
>>>>>>>>>> To make it more generalized, I changed it to the commit on submit repo.
>>>>>>>>>> It would check access flags between in coredump and in binary. If they are different, we respect current (loaded from coredump) map because it might be changed at runtime.
>>>>>>>>>>
>>>>>>>>>> The change for LabsJDK 11 is more simple because JDK 11 does not have ps_core_common.c .
>>>>>>>>>> So I share you it. It may help you:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8250826/JDK-8250826-labsjdk11-0.patch
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> thanks,
>>>>>>>>>>>
>>>>>>>>>>> Chris
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 7/30/20 1:18 PM, Chris Plummer wrote:
>>>>>>>>>>>>>> Hi Yasumasa,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm reviewing this RFR, and I'd like to ask that it not be pushed as trivial. Although it is just a one line change, it takes an extensive knowledge to understand the impact. I'll read up on the filed graal issue and try to understand the ELF code a bit better.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 7/30/20 6:45 AM, Yasumasa Suenaga wrote:
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please review this trivial change:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8250826
>>>>>>>>>>>>>>> webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8250826/webrev.00/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I played Truffle NFI on GraalVM, but I cannot get Java stacks from coredump via jhsdb.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've reported this issue to GraalVM community [1], and I 've found out the cause of this issue is .svm_heap would be separated to RO and RW areas by mprotect() calls in run time in spite of .svm_heap is RO section in ELF (please see [1] for details).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is corner case, but we will see same problem on jhsdb when we attempt to analyze coredump which comes from some applications / libraries which would separate RO sections in ELF like Substrate VM.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I sent PR to fix libsaproc.so in LabsJDK 11 for this issue [2], then community members suggested me to discuss in serviceability-dev.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yasumasa
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1] https://github.com/oracle/graal/issues/2579
>>>>>>>>>>>>>>> [2] https://github.com/graalvm/labs-openjdk-11/pull/9
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>
>
More information about the serviceability-dev
mailing list