RFR (trivial): 8214217: [TESTBUG] runtime/appcds/LotsOfClasses.java failed on solaris sparcv9
Jiangli Zhou
jiangli.zhou at oracle.com
Tue Nov 27 21:48:21 UTC 2018
Thanks, Calvin. I'll push after Ioi also confirms, so we can clear it in
tier3 testing.
Thanks,
Jiangli
On 11/27/18 1:30 PM, Calvin Cheung wrote:
> Looks okay to me.
>
> thanks,
> Calvin
>
> On 11/27/18, 11:34 AM, Jiangli Zhou wrote:
>> Ioi and I had further discussions. Here is the updated webrev with
>> the error message also including the current InitialHeapSize setting:
>>
>> http://cr.openjdk.java.net/~jiangli/8214217/webrev.02/
>>
>> I filed a RFE, https://bugs.openjdk.java.net/browse/JDK-8214388 for
>> improving the fragmentation handing.
>>
>> Thanks,
>> Jiangli
>>
>> On 11/26/18 6:58 PM, Jiangli Zhou wrote:
>>> Hi Ioi,
>>>
>>> Here is the updated webrev with improved object archiving error
>>> message and modified test fix. Please let me know if you have other
>>> suggestions.
>>>
>>> http://cr.openjdk.java.net/~jiangli/8214217/webrev.01/
>>>
>>>
>>> On 11/26/18 5:49 PM, Ioi Lam wrote:
>>>> I still don’t understand why it’s necessary to have a 3gb heap to
>>>> archive 32mb of objects. I don’t even know if this is guaranteed to
>>>> work.
>>>
>>> I probably was not clear in my earlier reply. I think using a
>>> lowered heap size instead of 3G setting in the test is okay. 256M
>>> probably is not large enough (in case allocation size changes in the
>>> future). I changed to use 500M. Please let me know if you also think
>>> that's reasonable.
>>>
>>>>
>>>> You said “having a large enough” heap will guarantee free space.
>>>> How large is large enough?
>>>
>>> Please see above.
>>>>
>>>> We are dumping the default archive with 128mb heap. Is that large
>>>> enough? What’s the criteria to decide that it’s large enough?
>>>
>>> The default archive is created using the default class list, which
>>> loads about 1000 classes. When generating the default archive, we
>>> explicitly set the java heap size to 128M instead of relying on
>>> ergonomics. With the 128M java heap for generating the default
>>> archive, we have never run into the fragmentation issue. Different
>>> java heap size should be used to meet different usage requirement.
>>>
>>>>
>>>> How should users set their heap size to guarantee success in
>>>> dumping their own archives? This test case shows that you can get
>>>> random failures when dumping large number of classes, so we need to
>>>> prevent that from happening for our users.
>>>
>>> The behavior is not random. If user run into the fragmentation
>>> error, they can try using a larger java heap.
>>>
>>>>
>>>> Printing an more elaborate error message is not enough. If the
>>>> error is random, it may not happen during regular testing by the
>>>> users, and only happens in deployment.
>>>
>>> Could you please explain why do you think it is random?
>>>>
>>>> Silently ignoring the error and continue dumping without archived
>>>> heap is also suboptimal. The user may randomly lose benefit of a
>>>> feature without even knowing it.
>>>
>>> Please let me know your suggestion.
>>>>
>>>> And you didn’t answer my question whether the problem is worse on
>>>> Solaris than Linux.
>>>
>>> On Solaris, I can also force it to fail with the fragmentation error
>>> with 200M java heap.
>>>
>>> Without seeing the actual gc region logging with the failed run that
>>> didn't set java heap size explicitly, my best guess is that the work
>>> load is different and causes Solaris to appear worse. That's why I
>>> think it is a test bug for not setting the heap size explicitly in
>>> this case.
>>>
>>> Thanks,
>>> Jiangli
>>>>
>>>> Thanks
>>>> Ioi
>>>>
>>>> On Nov 26, 2018, at 5:28 PM, Jiangli Zhou <jiangli.zhou at oracle.com
>>>> <mailto:jiangli.zhou at oracle.com>> wrote:
>>>>
>>>>> Hi Ioi,
>>>>>
>>>>>
>>>>> On 11/26/18 4:42 PM, Ioi Lam wrote:
>>>>>> The purpose of the stress test is not to tweak the parameters so
>>>>>> that the test will pass. It’s to understand the what the
>>>>>> limitations of our system are and why they exist.
>>>>>
>>>>> Totally agree with the above.
>>>>>> As I mentioned in the bug report, why would we run into
>>>>>> fragmentation when we have 96mb free space and we need only 32mb?
>>>>>> That’s the answer that we need to answer, not “let’s just give a
>>>>>> huge amount of heap”.
>>>>>
>>>>> During object archiving, we allocate from the highest free
>>>>> regions. The allocated regions must be *consecutive* regions.
>>>>> Those were the design decisions made in early days when I worked
>>>>> with Thomas and others in GC team for object archiving support.
>>>>>
>>>>> The determine factor is not the total free space in the heap, it
>>>>> is the amount of consecutive free regions available (starting from
>>>>> the highest free one) for archiving. GC activities might cause
>>>>> some regions at higher address being used. As we start from the
>>>>> highest free region, if we run into an already used region during
>>>>> allocation for archiving, we need to bail out.
>>>>>
>>>>> rn: Free Region
>>>>> r(n-1): Used Region
>>>>> r(n-2): Free Region
>>>>> ...
>>>>> Free Region
>>>>> Used Region
>>>>> ...
>>>>> r0: Used Region
>>>>>
>>>>> For example, if we want 3 regions during archiving, we allocate
>>>>> starting from rn. Since r(n-1) is already used, we can't use it
>>>>> for archiving. Certainly, the design could be improved. One
>>>>> approach that I've discussed with Thomas already is to use a
>>>>> temporary buffer instead of allocating from the heap directly.
>>>>> References need be adjusted during copying. With that, we can lift
>>>>> the consecutive region requirement. Since the object archiving is
>>>>> only supported for static archiving, and with large enough java
>>>>> heap it is guaranteed to successfully allocate top free regions,
>>>>> changing the current design is not a high priority task.
>>>>>> If at the end, the conclusion is that we need to have 8x the heap
>>>>>> size of the archived object size (256mb vs 32mb), and we
>>>>>> understand the reason why, that’s fine. But I think we should go
>>>>>> through that analysis process first. In doing so we may be able
>>>>>> to improve GC to make fragmentation less likely.
>>>>>
>>>>> I think the situation is well understood. Please let me know if
>>>>> you have any additional questions, I'll try to add more information.
>>>>>
>>>>> Thanks,
>>>>> Jiangli
>>>>>
>>>>>> Also, do we know if Linux and Solaris have the exact failure
>>>>>> mode? Or will Solaris fail more frequently than Linux with the
>>>>>> same heap size?
>>>>>>
>>>>>> Thanks
>>>>>> Ioi
>>>>>>
>>>>>>
>>>>>>> On Nov 26, 2018, at 3:55 PM, Jiangli
>>>>>>> Zhou<jiangli.zhou at oracle.com> wrote:
>>>>>>>
>>>>>>> Hi Ioi,
>>>>>>>
>>>>>>>> On 11/26/18 3:35 PM, Ioi Lam wrote:
>>>>>>>>
>>>>>>>> As I commented on the bug report, we should improve the error
>>>>>>>> message. Also, maybe we can force GC to allow the test to run
>>>>>>>> with less heap.
>>>>>>> Updating the error message sounds good to me.
>>>>>>>> A 3GB heap seems excessive. I was able to run the test with
>>>>>>>> -Xmx256M on Linux.
>>>>>>> Using a small heap (with only little extra space) might still
>>>>>>> run into the issue in the future. As I pointed out, alignment
>>>>>>> and GC activities are also factors. Allocation size might also
>>>>>>> change in the future.
>>>>>>>
>>>>>>> An alternative approach is to fix the test to recognize the
>>>>>>> fragmentation issue and don't report failure in that case. I'm
>>>>>>> now in favor of that approach since it's more flexible. We can
>>>>>>> also set a smaller heap size (such as 256M) in the test safely.
>>>>>>>> Also, I don't understand what you mean by "all observed
>>>>>>>> allocations were done in the lower 2G range.". Why would heap
>>>>>>>> fragmentation be related to the location of the heap?
>>>>>>> In my test run, only the heap regions in the lower 2G heap range
>>>>>>> were used for object allocations. It's not related to the heap
>>>>>>> location.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jiangli
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> - Ioi
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 11/26/18 3:23 PM, Jiangli Zhou wrote:
>>>>>>>>> Hi Ioi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 11/26/18 2:00 PM, Ioi Lam wrote:
>>>>>>>>>> Hi Jiangli,
>>>>>>>>>>
>>>>>>>>>> -Xms3G will most likely fail on 32-bit platforms.
>>>>>>>>> We can make the change for 64-bit platform only since it's a
>>>>>>>>> 64-bit problem only. We do not archive java objects with
>>>>>>>>> 32-bit platform.
>>>>>>>>>> BTW, why would this test fail only on Solaris and not linux?
>>>>>>>>>> The test doesn't specify heap size, so the initial heap size
>>>>>>>>>> setting is picked by Ergonomics. Can you reproduce the
>>>>>>>>>> failure on Linux by using the same heap size settings used by
>>>>>>>>>> the failed Solaris runs?
>>>>>>>>> The failed Solaris run didn't set heap size explicitly. The
>>>>>>>>> heap size was determined by GC ergonomics, as you pointed out
>>>>>>>>> above. I ran the test this morning on the same solaris sparc
>>>>>>>>> machine using the same binary that was reported for the issue.
>>>>>>>>> In my test run, a very large heap (>26G) was used according to
>>>>>>>>> the gc region logging output. So the test didn't run into the
>>>>>>>>> heap fragmentation issue. All observed allocations were done
>>>>>>>>> in the lower 2G range.
>>>>>>>>>
>>>>>>>>> I don't think it is a Solaris only issue. If the heap size is
>>>>>>>>> small enough, you could run into the issue on all supported
>>>>>>>>> platforms. The issue could appear to be intermittent due to
>>>>>>>>> alignment and GC activities even with the same heap size that
>>>>>>>>> the failure was reported.
>>>>>>>>>
>>>>>>>>> On linux x64 machine, I can force the test to failure with the
>>>>>>>>> fragmentation error with 200M java heap.
>>>>>>>>>> I think it's better to find out the root cause than just to
>>>>>>>>>> mask it. The purpose of LotsOfClasses.java is to stress the
>>>>>>>>>> system to find out potential bugs.
>>>>>>>>> I think this is a test issue, but not a CDS/GC issue. The test
>>>>>>>>> loads >20000 classes, but doesn't set java heap size. Relying
>>>>>>>>> on GC ergonomics to determine the 'right' heap size is
>>>>>>>>> incorrect in this case since dumping objects requires
>>>>>>>>> consecutive gc regions. Specifying the GC heap size explicitly
>>>>>>>>> doesn't 'mask' the issue, but is the right thing to do. :)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Jiangli
>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> - Ioi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 11/26/18 1:41 PM, Jiangli Zhou wrote:
>>>>>>>>>>> Please review the following test fix, which sets the java
>>>>>>>>>>> heap size to 3G for dumping with large number of classes.
>>>>>>>>>>>
>>>>>>>>>>> webrev:http://cr.openjdk.java.net/~jiangli/8214217/webrev.00/
>>>>>>>>>>>
>>>>>>>>>>> bug:https://bugs.openjdk.java.net/browse/JDK-8214217
>>>>>>>>>>>
>>>>>>>>>>> Tested with tier1 and tier3. Also ran the test 100 times on
>>>>>>>>>>> solaris-sparcv9 via mach5.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Jiangli
>>>>>>>>>>>
>>>>>
>>>
>>
More information about the hotspot-runtime-dev
mailing list