RFR (trivial): 8214217: [TESTBUG] runtime/appcds/LotsOfClasses.java failed on solaris sparcv9
Calvin Cheung
calvin.cheung at oracle.com
Tue Nov 27 21:30:20 UTC 2018
Looks okay to me.
thanks,
Calvin
On 11/27/18, 11:34 AM, Jiangli Zhou wrote:
> Ioi and I had further discussions. Here is the updated webrev with the
> error message also including the current InitialHeapSize setting:
>
> http://cr.openjdk.java.net/~jiangli/8214217/webrev.02/
>
> I filed a RFE, https://bugs.openjdk.java.net/browse/JDK-8214388 for
> improving the fragmentation handing.
>
> Thanks,
> Jiangli
>
> On 11/26/18 6:58 PM, Jiangli Zhou wrote:
>> Hi Ioi,
>>
>> Here is the updated webrev with improved object archiving error
>> message and modified test fix. Please let me know if you have other
>> suggestions.
>>
>> http://cr.openjdk.java.net/~jiangli/8214217/webrev.01/
>>
>>
>> On 11/26/18 5:49 PM, Ioi Lam wrote:
>>> I still don’t understand why it’s necessary to have a 3gb heap to
>>> archive 32mb of objects. I don’t even know if this is guaranteed to
>>> work.
>>
>> I probably was not clear in my earlier reply. I think using a lowered
>> heap size instead of 3G setting in the test is okay. 256M probably is
>> not large enough (in case allocation size changes in the future). I
>> changed to use 500M. Please let me know if you also think that's
>> reasonable.
>>
>>>
>>> You said “having a large enough” heap will guarantee free space. How
>>> large is large enough?
>>
>> Please see above.
>>>
>>> We are dumping the default archive with 128mb heap. Is that large
>>> enough? What’s the criteria to decide that it’s large enough?
>>
>> The default archive is created using the default class list, which
>> loads about 1000 classes. When generating the default archive, we
>> explicitly set the java heap size to 128M instead of relying on
>> ergonomics. With the 128M java heap for generating the default
>> archive, we have never run into the fragmentation issue. Different
>> java heap size should be used to meet different usage requirement.
>>
>>>
>>> How should users set their heap size to guarantee success in dumping
>>> their own archives? This test case shows that you can get random
>>> failures when dumping large number of classes, so we need to prevent
>>> that from happening for our users.
>>
>> The behavior is not random. If user run into the fragmentation error,
>> they can try using a larger java heap.
>>
>>>
>>> Printing an more elaborate error message is not enough. If the error
>>> is random, it may not happen during regular testing by the users,
>>> and only happens in deployment.
>>
>> Could you please explain why do you think it is random?
>>>
>>> Silently ignoring the error and continue dumping without archived
>>> heap is also suboptimal. The user may randomly lose benefit of a
>>> feature without even knowing it.
>>
>> Please let me know your suggestion.
>>>
>>> And you didn’t answer my question whether the problem is worse on
>>> Solaris than Linux.
>>
>> On Solaris, I can also force it to fail with the fragmentation error
>> with 200M java heap.
>>
>> Without seeing the actual gc region logging with the failed run that
>> didn't set java heap size explicitly, my best guess is that the work
>> load is different and causes Solaris to appear worse. That's why I
>> think it is a test bug for not setting the heap size explicitly in
>> this case.
>>
>> Thanks,
>> Jiangli
>>>
>>> Thanks
>>> Ioi
>>>
>>> On Nov 26, 2018, at 5:28 PM, Jiangli Zhou <jiangli.zhou at oracle.com
>>> <mailto:jiangli.zhou at oracle.com>> wrote:
>>>
>>>> Hi Ioi,
>>>>
>>>>
>>>> On 11/26/18 4:42 PM, Ioi Lam wrote:
>>>>> The purpose of the stress test is not to tweak the parameters so
>>>>> that the test will pass. It’s to understand the what the
>>>>> limitations of our system are and why they exist.
>>>>
>>>> Totally agree with the above.
>>>>> As I mentioned in the bug report, why would we run into
>>>>> fragmentation when we have 96mb free space and we need only 32mb?
>>>>> That’s the answer that we need to answer, not “let’s just give a
>>>>> huge amount of heap”.
>>>>
>>>> During object archiving, we allocate from the highest free regions.
>>>> The allocated regions must be *consecutive* regions. Those were the
>>>> design decisions made in early days when I worked with Thomas and
>>>> others in GC team for object archiving support.
>>>>
>>>> The determine factor is not the total free space in the heap, it is
>>>> the amount of consecutive free regions available (starting from the
>>>> highest free one) for archiving. GC activities might cause some
>>>> regions at higher address being used. As we start from the highest
>>>> free region, if we run into an already used region during
>>>> allocation for archiving, we need to bail out.
>>>>
>>>> rn: Free Region
>>>> r(n-1): Used Region
>>>> r(n-2): Free Region
>>>> ...
>>>> Free Region
>>>> Used Region
>>>> ...
>>>> r0: Used Region
>>>>
>>>> For example, if we want 3 regions during archiving, we allocate
>>>> starting from rn. Since r(n-1) is already used, we can't use it for
>>>> archiving. Certainly, the design could be improved. One approach
>>>> that I've discussed with Thomas already is to use a temporary
>>>> buffer instead of allocating from the heap directly. References
>>>> need be adjusted during copying. With that, we can lift the
>>>> consecutive region requirement. Since the object archiving is only
>>>> supported for static archiving, and with large enough java heap it
>>>> is guaranteed to successfully allocate top free regions, changing
>>>> the current design is not a high priority task.
>>>>> If at the end, the conclusion is that we need to have 8x the heap
>>>>> size of the archived object size (256mb vs 32mb), and we
>>>>> understand the reason why, that’s fine. But I think we should go
>>>>> through that analysis process first. In doing so we may be able to
>>>>> improve GC to make fragmentation less likely.
>>>>
>>>> I think the situation is well understood. Please let me know if you
>>>> have any additional questions, I'll try to add more information.
>>>>
>>>> Thanks,
>>>> Jiangli
>>>>
>>>>> Also, do we know if Linux and Solaris have the exact failure mode?
>>>>> Or will Solaris fail more frequently than Linux with the same heap
>>>>> size?
>>>>>
>>>>> Thanks
>>>>> Ioi
>>>>>
>>>>>
>>>>>> On Nov 26, 2018, at 3:55 PM, Jiangli
>>>>>> Zhou<jiangli.zhou at oracle.com> wrote:
>>>>>>
>>>>>> Hi Ioi,
>>>>>>
>>>>>>> On 11/26/18 3:35 PM, Ioi Lam wrote:
>>>>>>>
>>>>>>> As I commented on the bug report, we should improve the error
>>>>>>> message. Also, maybe we can force GC to allow the test to run
>>>>>>> with less heap.
>>>>>> Updating the error message sounds good to me.
>>>>>>> A 3GB heap seems excessive. I was able to run the test with
>>>>>>> -Xmx256M on Linux.
>>>>>> Using a small heap (with only little extra space) might still run
>>>>>> into the issue in the future. As I pointed out, alignment and GC
>>>>>> activities are also factors. Allocation size might also change in
>>>>>> the future.
>>>>>>
>>>>>> An alternative approach is to fix the test to recognize the
>>>>>> fragmentation issue and don't report failure in that case. I'm
>>>>>> now in favor of that approach since it's more flexible. We can
>>>>>> also set a smaller heap size (such as 256M) in the test safely.
>>>>>>> Also, I don't understand what you mean by "all observed
>>>>>>> allocations were done in the lower 2G range.". Why would heap
>>>>>>> fragmentation be related to the location of the heap?
>>>>>> In my test run, only the heap regions in the lower 2G heap range
>>>>>> were used for object allocations. It's not related to the heap
>>>>>> location.
>>>>>>
>>>>>> Thanks,
>>>>>> Jiangli
>>>>>>> Thanks
>>>>>>>
>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>>>>>>>> On 11/26/18 3:23 PM, Jiangli Zhou wrote:
>>>>>>>> Hi Ioi,
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 11/26/18 2:00 PM, Ioi Lam wrote:
>>>>>>>>> Hi Jiangli,
>>>>>>>>>
>>>>>>>>> -Xms3G will most likely fail on 32-bit platforms.
>>>>>>>> We can make the change for 64-bit platform only since it's a
>>>>>>>> 64-bit problem only. We do not archive java objects with 32-bit
>>>>>>>> platform.
>>>>>>>>> BTW, why would this test fail only on Solaris and not linux?
>>>>>>>>> The test doesn't specify heap size, so the initial heap size
>>>>>>>>> setting is picked by Ergonomics. Can you reproduce the failure
>>>>>>>>> on Linux by using the same heap size settings used by the
>>>>>>>>> failed Solaris runs?
>>>>>>>> The failed Solaris run didn't set heap size explicitly. The
>>>>>>>> heap size was determined by GC ergonomics, as you pointed out
>>>>>>>> above. I ran the test this morning on the same solaris sparc
>>>>>>>> machine using the same binary that was reported for the issue.
>>>>>>>> In my test run, a very large heap (>26G) was used according to
>>>>>>>> the gc region logging output. So the test didn't run into the
>>>>>>>> heap fragmentation issue. All observed allocations were done in
>>>>>>>> the lower 2G range.
>>>>>>>>
>>>>>>>> I don't think it is a Solaris only issue. If the heap size is
>>>>>>>> small enough, you could run into the issue on all supported
>>>>>>>> platforms. The issue could appear to be intermittent due to
>>>>>>>> alignment and GC activities even with the same heap size that
>>>>>>>> the failure was reported.
>>>>>>>>
>>>>>>>> On linux x64 machine, I can force the test to failure with the
>>>>>>>> fragmentation error with 200M java heap.
>>>>>>>>> I think it's better to find out the root cause than just to
>>>>>>>>> mask it. The purpose of LotsOfClasses.java is to stress the
>>>>>>>>> system to find out potential bugs.
>>>>>>>> I think this is a test issue, but not a CDS/GC issue. The test
>>>>>>>> loads >20000 classes, but doesn't set java heap size. Relying
>>>>>>>> on GC ergonomics to determine the 'right' heap size is
>>>>>>>> incorrect in this case since dumping objects requires
>>>>>>>> consecutive gc regions. Specifying the GC heap size explicitly
>>>>>>>> doesn't 'mask' the issue, but is the right thing to do. :)
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jiangli
>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> - Ioi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 11/26/18 1:41 PM, Jiangli Zhou wrote:
>>>>>>>>>> Please review the following test fix, which sets the java
>>>>>>>>>> heap size to 3G for dumping with large number of classes.
>>>>>>>>>>
>>>>>>>>>> webrev:http://cr.openjdk.java.net/~jiangli/8214217/webrev.00/
>>>>>>>>>>
>>>>>>>>>> bug:https://bugs.openjdk.java.net/browse/JDK-8214217
>>>>>>>>>>
>>>>>>>>>> Tested with tier1 and tier3. Also ran the test 100 times on
>>>>>>>>>> solaris-sparcv9 via mach5.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Jiangli
>>>>>>>>>>
>>>>
>>
>
More information about the hotspot-runtime-dev
mailing list