Code Review request for new MallocMaxBytes option and fixed test (7030610, 7037122, 7123945)

David Holmes david.holmes at oracle.com
Wed Mar 13 01:05:18 PDT 2013


Dan, Ron,

The atomic load operation is not needed. We only use it for a 64-bit 
type if on 64-bit and on 64-bit a load of a 64-bit type is always atomic 
- no need for Atomic::load. So this:

+
+   if (MallocMaxBytes > 0) {
+     size_t tmp_size;
+ #ifdef _LP64
+     tmp_size =
+       (size_t) Atomic::load((volatile jlong *)(&current_malloc_size));
+ #else
+     tmp_size = current_malloc_size;
+ #endif
+     if ((tmp_size + size + space_before + space_after) > MallocMaxBytes) {
+       return NULL;
+     }
+   }

reduces to just:

if (MallocMaxBytes > 0 && current_malloc_size + size + space_before + 
space_after) > MallocMaxBytes) {
    return NULL;
}

but there's also a risk of overflow in the calculation that could give 
the wrong result. Plus this is all racy anyway so this limit is only a 
soft limit.

The Atomic::add is okay because it is always implemented on 64-bit for 
jlong; and always implemented on 32-bit for jint.

But this code is really ugly. We are using platform specific types that 
hide whether they are 32-bit or 64-bit but then have to expose it all 
with ifdef _LP64 and we use atomic functions that have to cast 
everything to jlong or jint. :(

BTW the JVM stat counters should only be working on jints not jlong - to 
avoid the atomics issue.

David
-----

On 13/03/2013 12:28 PM, Daniel D. Daugherty wrote:
> On 3/12/13 7:40 PM, Coleen Phillmore wrote:
>>
>> On 3/12/2013 9:20 PM, Daniel D. Daugherty wrote:
>>> I think Ron's off the air for the evening, but I'm doing gatekeeping
>>> duty so I'll chime in to keep the flow going...
>>
>> I'm waiting for a checkin job...  so I can merge and checkin again...
>
> Yes, I saw you migrate your job from JPRT-hotspotwest to JPRT-east
> and I see that it failed on JPRT-east... Are you planning to try
> a rerun?
>
>
>>>
>>>
>>> On 3/12/13 6:59 PM, Coleen Phillmore wrote:
>>>>
>>>> Hi, I had a look at this and I thought Atomic operations didn't work
>>>> for jlong on all platforms?
>>>
>>> I've heard that too and then I looked at Atomic::add(jlong) and realized
>>> that it stays in a loop until it works. In any case, Ron modeled his
>>> logic after the JVM stat counter stuff so it is just as reliable as
>>> those counters. And no, he couldn't use the JVM stat counter stuff
>>> because it is not available in product bits and I insisted that his
>>> fix had to work for product bits. In any case, the counter is used
>>> for a one-way trip to memory exhaustion so absolute accuracy is not
>>> required (like the JVM stat counters).
>>
>> I just grepped the source and Atomic::load() has Unimplemented() in
>> some platforms (not open).  It is not supported everywhere.
>
> Are these 32-bit or 64-bit platforms? If these are 32-bit platforms,
> then there isn't an issue because size_t is 32-bits so the atomic
> jlong operations aren't used.
>
>
>>   I don't think this should be a product flag if it's inaccurate.  As
>> a testing aid it's fine, but shouldn't be introduced in product.
>
> So you're saying it is preferable for the memory exhaustion tests to not
> be run in product because they kill the machines. Sorry, I prefer the
> tests to run in all the build flavors.
>
>
>>>
>>>> I'm also concerned with performance of yet more atomic operations in
>>>> os::malloc() calls.   Can you run refworkload and performance tests
>>>> to verify that these do not slow down?
>>>
>>> The new atomic counter only comes into play when the MallocMaxBytes
>>> option is non-zero so I don't see why you think this will slow things
>>> down. We're talking one test and branch.
>>>
>>>>   I'd be a lot happier if this was a develop flag
>>>
>>> Develop flags are not available in product bits and the point of adding
>>> the MallocMaxBytes option is so that we can run these memory exhaustion
>>> tests reasonably even with product bits. The longest run for the revised
>>> test in JPRT was just over 30 seconds on a Solaris SPARC machine. One of
>>> the open test bugs talks about the test swamping the machine for tens of
>>> minutes and sometimes being unkillable. Which is why JTREG had problems
>>> cleaning up after the test...
>>
>> Ok. I can see why this is product then.   It would be good to bound
>> these OOM tests somehow.   This is an interesting solution.
>
> Ron thought you would like it. Was it you or Karen that asked for
> the temporary malloc ceiling from Ron's last fix to become a
> permanent addition? I'll have to check my archive...
>
>
>>>
>>>> and the code 589-600 was a function check_malloc_bytes() and 604-612
>>>> was add_malloc_bytes().
>>>
>>> Lastly, you want to change the in-line diagnostic code into a function
>>> call and you're worried about slowing things down?
>>
>> Clean code is worth a call instruction.
>
> You can't argue both sides of the street. :-)
>
> You were complaining about the new atomic ops above that were
> guarded by a flag check so later asking for a call instruction
> is way more expensive.
>
>
>>>
>>>
>>>> It's hard to see the ::malloc() there now!
>>> It was hard to the ::malloc() in there before and yes, this
>>> fix makes it worse. However, how often is anyone in this code
>>> besides Zhengyu and now Ron?
>>
>> So the excuse for messy code is that not many people see it?
>
> Messy code? Really, do you really want to go down that rat hole
> in this particular file? I'm not at all sure why you're going
> after this change so hard, but it is definitely interesting to
> observe.
>
> I'll let Ron jump in on this thread tomorrow. This is his code
> so he gets to make the call. He might tell me I'm all wet and
> agree with everything you say. Who knows...
>
> Dan
>
>
>>
>> Thanks,
>> Coleen
>>>
>>> Dan
>>>
>>>
>>>>
>>>> Coleen
>>>>
>>>> On 3/12/2013 8:22 PM, Ron Durbin wrote:
>>>>> David
>>>>>
>>>>> Dan has provided a link. So you can access the webrev and review
>>>>> the changes.
>>>>>
>>>>> Ron
>>>>>
>>>>> -----Original Message-----
>>>>> From: Daniel D. Daugherty
>>>>> Sent: Tuesday, March 12, 2013 8:43 AM
>>>>> To: hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: Code Review request for new MallocMaxBytes option and
>>>>> fixed test (7030610, 7037122, 7123945)
>>>>>
>>>>> David,
>>>>>
>>>>> Thanks for spotting the URL hiccup. I missed it during my review.
>>>>>
>>>>> There is something wrong with Ron's access to cr.openjdk.java.net
>>>>> so I'm going to host his webrev for this review:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/for_rdurbin/7030610-webrev/0-hsx25/
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 3/11/13 8:54 PM, David Holmes wrote:
>>>>>> Ron,
>>>>>>
>>>>>> On 12/03/2013 10:01 AM, Ron Durbin wrote:
>>>>>>> All
>>>>>>>
>>>>>>> I have a code review readyfor:
>>>>>>>
>>>>>>> WEBREV http://javaweb.us.oracle.com/~rdurbin/7030610-webrev/0-hsx25/
>>>>>> Please make available an open webrev on cr.openjdk.java.net
>>>>>>
>>>>>> Thank you.
>>>>>> David
>>>>>>
>>>>>>> 7030610 runtime/6878713/Test6878713.sh fails, Error. failed to clean
>>>>>>> up files after test.
>>>>>>>
>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7030610
>>>>>>>
>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-7030610
>>>>>>>
>>>>>>> 7037122 runtime/6878713/Test6878713.sh is written incorrectly
>>>>>>>
>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7037122
>>>>>>>
>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-7037122
>>>>>>>
>>>>>>> 7123945 runtime/6878713/Test6878713.sh require about 2G of
>>>>>>>
>>>>>>> _http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7123945_
>>>>>>>
>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-7123945
>>>>>>>
>>>>>>> Summary: Add new diagnostic option -XX:MallocMaxBytes=NNN and fix
>>>>>>>
>>>>>>> Test6878713.sh. Test6878713.sh is a security regression test with
>>>>>>> four different failure modes.Three aredocumented in the defects
>>>>>>> above
>>>>>>> and the forth
>>>>>>>
>>>>>>> was found duringbug triage. The root cause of these failures can be
>>>>>>>
>>>>>>> traced back to two issues:Inconsistent test conditionsetup and error
>>>>>>> handling.
>>>>>>>
>>>>>>> All four defect manifestations are fixed by this set of changes.
>>>>>>>
>>>>>>> New code built and test executed via JPRT test job; Linux ARM and
>>>>>>> Linux PPC version built, but not tested because ARM and PPC are
>>>>>>> currently build-only platforms in JPRT.
>>>>>>>
>>>>>>> As always, comments and questions are welcome.
>>>>>>>
>>>>>>> Ron
>>>>>>>
>>>>>>
>>>>
>>>
>>
>


More information about the hotspot-runtime-dev mailing list