RFR: 8250826: jhsdb does not work with coredump which comes from Substrate VM

Chris Plummer chris.plummer at oracle.com
Tue Aug 4 00:54:25 UTC 2020


Hi Yasumasa,

Your changes look good now.

thanks,

Chris

On 8/3/20 5:47 PM, Yasumasa Suenaga wrote:
> Hi Chris,
>
> Thank you for the comment!
> I updated webrev:
>
>   http://cr.openjdk.java.net/~ysuenaga/JDK-8250826/webrev.02/
>   Diff from webrev.01: 
> http://hg.openjdk.java.net/jdk/submit/rev/e98dc25b69c2
>
> On 2020/08/04 6:41, Chris Plummer wrote:
>> Hi Yasumasa,
>>
>> Your updated fix resulted in using the core file map whereas the 
>> original fix used the library map. In both cases the assert is 
>> avoided, which I think is the main goal. Does it matter which map is 
>> used?
>
> In GraalVM, read only segment is conflicted, thus it does not matter 
> which map is used.
> However this webrev is more generalize, so segments in coredump should 
> be used.
>
>>    42 #ifndef PF_R
>>    43 #define PF_R 0x4
>>    44 #endif
>>
>>   156   if ((map = allocate_init_map(ph->core->classes_jsa_fd,
>>   157                                offset, vaddr, memsz, PF_R)) == 
>> NULL) {
>>
>> I'm not so sure this is appropriate for OSX. It uses mach-o files, 
>> not elf files. The segment_command flags field comes from loader.h 
>> [1]. I don't see anything in there that looks like the equivalent of 
>> ELF access flags.
>>
>> /* Constants for the flags field of the segment_command */
>> #define    SG_HIGHVM    0x1    /* the file contents for this segment 
>> is for
>>                     the high part of the VM space, the low part
>>                     is zero filled (for stacks in core files) */
>> #define    SG_FVMLIB    0x2    /* this segment is the VM that is 
>> allocated by
>>                     a fixed VM library, for overlap checking in
>>                     the link editor */
>> #define    SG_NORELOC    0x4    /* this segment has nothing that was 
>> relocated
>>                     in it and nothing relocated to it, that is
>>                     it maybe safely replaced without relocation*/
>> #define SG_PROTECTED_VERSION_1    0x8 /* This segment is protected.  
>> If the
>>                         segment starts at file offset 0, the
>>                         first page of the segment is not
>>                         protected.  All other pages of the
>>                         segment are protected. */
>>
>> Since the flags don't matter for OSX, maybe you should just pass 0. 
>> You can do something like:
>>
>> #ifndef PF_R
>> #define MAP_R_FLAG PF_R
>> #else
>> #define MAP_R_FLAG 0
>> #endif
>
> Thanks!
> I thought PF_R can be used PF_R from elf.h on macOS:
>   https://opensource.apple.com/source/dtrace/dtrace-90/sys/elf.h
>
> I merged your code in this webrev.
>
>> Some minor comment fixes are needed:
>>
>>   397         // Access flags fot this memory region is different 
>> between the library
>>
>> "fot" -> "for"
>> "is" -> "are"
>>
>>   399         // We should respect to coredump.
>>
>> "to" -> "the"
>>
>>   404         // And head of ELF header might be included in coredump 
>> (See JDK-7133122).
>>   405         // Thus we need to replace PT_LOAD segments the library 
>> version.
>>
>> How about:
>>
>>   404         // Also the first page of the ELF header might be 
>> included in the coredump (See JDK-7133122).
>>   405         // Thus we need to replace the PT_LOAD segment with the 
>> library version.
>
> Fixed them.
>
>
> Thanks,
>
> Yasumasa
>
>
>> thanks,
>>
>> Chris
>>
>> [1] 
>> https://opensource.apple.com/source/xnu/xnu-1456.1.26/EXTERNAL_HEADERS/mach-o/loader.h.auto.html
>>
>> On 8/2/20 12:18 AM, Yasumasa Suenaga wrote:
>>> Hi Chris,
>>>
>>> (Remove "trivial" from subject)
>>>
>>> Thanks for the information! I fixed errors in new webrev. It passed 
>>> tests on submit repo 
>>> (mach5-one-ysuenaga-JDK-8250826-1-20200802-0151-13109525)
>>>
>>>   http://cr.openjdk.java.net/~ysuenaga/JDK-8250826/webrev.01/
>>>
>>>
>>> I tried to use elf.h instead of #define for PF_R, however it failed 
>>> (mach5-one-ysuenaga-JDK-8250826-1-20200802-0542-13111335).
>>>
>>>   http://hg.openjdk.java.net/jdk/submit/rev/67baee1a1a1d
>>>
>>> Thus I added #define for it in this webrev.
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> On 2020/08/02 10:22, Chris Plummer wrote:
>>>> Hi Yasumasa,
>>>>
>>>> [2020-08-01T14:15:42,514Z] Creating 
>>>> support/native/jdk.hotspot.agent/libsaproc/static/libsaproc.a from 
>>>> 8 file(s)
>>>> [2020-08-01T14:15:43,961Z] 
>>>> ./open/src/jdk.hotspot.agent/share/native/libsaproc/ps_core_common.c:128:8: 
>>>> error: no member named 'flags' in 'struct map_info'
>>>> [2020-08-01T14:15:43,961Z]   map->flags  = flags;
>>>> [2020-08-01T14:15:43,961Z]   ~~~  ^
>>>> [2020-08-01T14:15:43,963Z] 
>>>> ./open/src/jdk.hotspot.agent/share/native/libsaproc/ps_core_common.c:153:54: 
>>>> error: use of undeclared identifier 'PF_R'
>>>> [2020-08-01T14:15:43,963Z] offset, vaddr, memsz, PF_R)) == NULL) {
>>>>
>>>> I'll look at the code changes later. No time at the moment.
>>>>
>>>> thanks,
>>>>
>>>> Chris
>>>>
>>>> 2020-08-01-1405571.suenaga.source2020-08-01-1405571.suenaga.source 
>>>> 2020-08-01-1405571.suenaga.source On 8/1/20 5:20 PM, Yasumasa 
>>>> Suenaga wrote:
>>>>> Hi Chris,
>>>>>
>>>>> Thanks for your comment!
>>>>> I pushed new change to submit repo, but the build failed on macOS. 
>>>>> Could you share details?
>>>>> (I do not have Mac)
>>>>>
>>>>>   commit: http://hg.openjdk.java.net/jdk/submit/rev/0eb1c497f297
>>>>>   job: mach5-one-ysuenaga-JDK-8250826-1-20200801-1407-13098989
>>>>>
>>>>> On 2020/08/01 13:06, Chris Plummer wrote:
>>>>>> On 7/30/20 6:18 PM, Yasumasa Suenaga wrote:
>>>>>>> Hi Chris,
>>>>>>>
>>>>>>> On 2020/07/31 7:29, Chris Plummer wrote:
>>>>>>>> Hi Yasumasa,
>>>>>>>>
>>>>>>>> If I understand correctly we first call add_map_info() for all 
>>>>>>>> the PT_LOAD segments in the core file. We then process all the 
>>>>>>>> library segments, calling add_map_info() for them if the 
>>>>>>>> target_vaddr has not already been addded. If has already been 
>>>>>>>> added, which I assume is the case for any library segment that 
>>>>>>>> is already in the core file, then the core file version is 
>>>>>>>> replaced the the library version.  I'm a little unclear of the 
>>>>>>>> purpose of this replacing of the core PT_LOAD segments with 
>>>>>>>> those found in the libraries. If you could explain this that 
>>>>>>>> would help me understand your change.
>>>>>>>
>>>>>>> Read only segments in ELF should not be any different from 
>>>>>>> PT_LOAD segments in the core.
>>>>>>> And head of ELF header might be included in coredump (See 
>>>>>>> JDK-7133122). Thus we need to replace PT_LOAD segments the 
>>>>>>> library version.
>>>>>> Ok. The code in the area really should have been commented better 
>>>>>> when first written. The purpose is not understandable simply by 
>>>>>> reading the code.
>>>>>
>>>>> I added some comments to existing code. Please tell me if it is 
>>>>> insufficient.
>>>>>
>>>>>
>>>>>>>> I'm also unsure why existing_map->fd would ever be something 
>>>>>>>> other than the core file. Why would another library map the 
>>>>>>>> same target_vaddr.
>>>>>>>
>>>>>>> When mmap() is called to read-only ELF segments / sections, 
>>>>>>> Linux kernel seems to allocate other memory segments which has 
>>>>>>> same top virtual memory address. I've not yet found out from the 
>>>>>>> code of Linux kernel, but I confirmed this behavior on GDB.
>>>>>> Ok. Same comment as above. This should have been explained with 
>>>>>> comments in the code.
>>>>>
>>>>> Added some comments.
>>>>>
>>>>>
>>>>>> As for your fix, if I understand correctly the issue is that a 
>>>>>> single segment in the library is being split into two segments in 
>>>>>> the process (and therefore in the core file) due to an mprotect 
>>>>>> being done on part of the segment. Because of this the segment 
>>>>>> size in the library does match the segment size in the core file. 
>>>>>> So with your fix the library segment is used, but what about the 
>>>>>> other half of the segment that is in the core file? Don't we now 
>>>>>> have overlapping segments; the full original segment from the 
>>>>>> library, and then a second segment that overlaps the tail end of 
>>>>>> the library segment? Will that cause any confusion later on?
>>>>>
>>>>> As long as vaddr is valid, it doesn't matter even if it overlaps 
>>>>> because SA would sort the map with vaddr, and would lookup with it.
>>>>> In Substrate VM, there are RO and RW sections in that order, so it 
>>>>> is ok with webrev.00 . However it might not be appropriate because 
>>>>> RW section might be top of PT_LOAD.
>>>>>
>>>>> To make it more generalized, I changed it to the commit on submit 
>>>>> repo.
>>>>> It would check access flags between in coredump and in binary. If 
>>>>> they are different, we respect current (loaded from coredump) map 
>>>>> because it might be changed at runtime.
>>>>>
>>>>> The change for LabsJDK 11 is more simple because JDK 11 does not 
>>>>> have ps_core_common.c .
>>>>> So I share you it. It may help you:
>>>>>
>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8250826/JDK-8250826-labsjdk11-0.patch 
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Yasumasa
>>>>>>>
>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On 7/30/20 1:18 PM, Chris Plummer wrote:
>>>>>>>>> Hi Yasumasa,
>>>>>>>>>
>>>>>>>>> I'm reviewing this RFR, and I'd like to ask that it not be 
>>>>>>>>> pushed as trivial. Although it is just a one line change, it 
>>>>>>>>> takes an extensive knowledge to understand the impact. I'll 
>>>>>>>>> read up on the filed graal issue and try to understand the ELF 
>>>>>>>>> code a bit better.
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>> On 7/30/20 6:45 AM, Yasumasa Suenaga wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Please review this trivial change:
>>>>>>>>>>
>>>>>>>>>>   JBS: https://bugs.openjdk.java.net/browse/JDK-8250826
>>>>>>>>>>   webrev: 
>>>>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8250826/webrev.00/
>>>>>>>>>>
>>>>>>>>>> I played Truffle NFI on GraalVM, but I cannot get Java stacks 
>>>>>>>>>> from coredump via jhsdb.
>>>>>>>>>>
>>>>>>>>>> I've reported this issue to GraalVM community [1], and I 've 
>>>>>>>>>> found out the cause of this issue is .svm_heap would be 
>>>>>>>>>> separated to RO and RW areas by mprotect() calls in run time 
>>>>>>>>>> in spite of .svm_heap is RO section in ELF (please see [1] 
>>>>>>>>>> for details).
>>>>>>>>>>
>>>>>>>>>> It is corner case, but we will see same problem on jhsdb when 
>>>>>>>>>> we attempt to analyze coredump which comes from some 
>>>>>>>>>> applications / libraries which would separate RO sections in 
>>>>>>>>>> ELF like Substrate VM.
>>>>>>>>>>
>>>>>>>>>> I sent PR to fix libsaproc.so in LabsJDK 11 for this issue 
>>>>>>>>>> [2], then community members suggested me to discuss in 
>>>>>>>>>> serviceability-dev.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Yasumasa
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/oracle/graal/issues/2579
>>>>>>>>>> [2] https://github.com/graalvm/labs-openjdk-11/pull/9
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>




More information about the serviceability-dev mailing list