Code Review request for new MallocMaxBytes option and fixed test (7030610, 7037122, 7123945)

Wed Mar 13 06:27:34 PDT 2013

David

Is the 64 bit type always aligned on a mod 64 bit addr?
Is the 32 bit type always aligned on a mod 32 bit addr?
Can we always assure that the read is not split across more than one read cycle?

If the answers are always yes on all platforms you are correct? 

And thanks for reviewing the code.

Ron
-----Original Message-----
From: David Holmes 
Sent: Wednesday, March 13, 2013 2:05 AM
To: Daniel Daugherty
Cc: hotspot-runtime-dev at openjdk.java.net
Subject: Re: Code Review request for new MallocMaxBytes option and fixed test (7030610, 7037122, 7123945)

Dan, Ron,

The atomic load operation is not needed. We only use it for a 64-bit type if on 64-bit and on 64-bit a load of a 64-bit type is always atomic
- no need for Atomic::load. So this:

+
+   if (MallocMaxBytes > 0) {
+     size_t tmp_size;
+ #ifdef _LP64
+     tmp_size =
+       (size_t) Atomic::load((volatile jlong *)(&current_malloc_size));
+ #else
+     tmp_size = current_malloc_size;
+ #endif
+     if ((tmp_size + size + space_before + space_after) > MallocMaxBytes) {
+       return NULL;
+     }
+   }

reduces to just:

if (MallocMaxBytes > 0 && current_malloc_size + size + space_before + 
space_after) > MallocMaxBytes) {
    return NULL;
}

but there's also a risk of overflow in the calculation that could give 
the wrong result. Plus this is all racy anyway so this limit is only a 
soft limit.

The Atomic::add is okay because it is always implemented on 64-bit for 
jlong; and always implemented on 32-bit for jint.

But this code is really ugly. We are using platform specific types that 
hide whether they are 32-bit or 64-bit but then have to expose it all 
with ifdef _LP64 and we use atomic functions that have to cast 
everything to jlong or jint. :(

BTW the JVM stat counters should only be working on jints not jlong - to 
avoid the atomics issue.

David
-----

On 13/03/2013 12:28 PM, Daniel D. Daugherty wrote:
> On 3/12/13 7:40 PM, Coleen Phillmore wrote:
>>
>> On 3/12/2013 9:20 PM, Daniel D. Daugherty wrote:
>>> I think Ron's off the air for the evening, but I'm doing gatekeeping
>>> duty so I'll chime in to keep the flow going...
>>
>> I'm waiting for a checkin job...  so I can merge and checkin again...
>
> Yes, I saw you migrate your job from JPRT-hotspotwest to JPRT-east
> and I see that it failed on JPRT-east... Are you planning to try
> a rerun?
>
>
>>>
>>>
>>> On 3/12/13 6:59 PM, Coleen Phillmore wrote:
>>>>
>>>> Hi, I had a look at this and I thought Atomic operations didn't work
>>>> for jlong on all platforms?
>>>
>>> I've heard that too and then I looked at Atomic::add(jlong) and realized
>>> that it stays in a loop until it works. In any case, Ron modeled his
>>> logic after the JVM stat counter stuff so it is just as reliable as
>>> those counters. And no, he couldn't use the JVM stat counter stuff
>>> because it is not available in product bits and I insisted that his
>>> fix had to work for product bits. In any case, the counter is used
>>> for a one-way trip to memory exhaustion so absolute accuracy is not
>>> required (like the JVM stat counters).
>>
>> I just grepped the source and Atomic::load() has Unimplemented() in
>> some platforms (not open).  It is not supported everywhere.
>
> Are these 32-bit or 64-bit platforms? If these are 32-bit platforms,
> then there isn't an issue because size_t is 32-bits so the atomic
> jlong operations aren't used.
>
>
>>   I don't think this should be a product flag if it's inaccurate.  As
>> a testing aid it's fine, but shouldn't be introduced in product.
>
> So you're saying it is preferable for the memory exhaustion tests to not
> be run in product because they kill the machines. Sorry, I prefer the
> tests to run in all the build flavors.
>
>
>>>
>>>> I'm also concerned with performance of yet more atomic operations in
>>>> os::malloc() calls.   Can you run refworkload and performance tests
>>>> to verify that these do not slow down?
>>>
>>> The new atomic counter only comes into play when the MallocMaxBytes
>>> option is non-zero so I don't see why you think this will slow things
>>> down. We're talking one test and branch.
>>>
>>>>   I'd be a lot happier if this was a develop flag
>>>
>>> Develop flags are not available in product bits and the point of adding
>>> the MallocMaxBytes option is so that we can run these memory exhaustion
>>> tests reasonably even with product bits. The longest run for the revised
>>> test in JPRT was just over 30 seconds on a Solaris SPARC machine. One of
>>> the open test bugs talks about the test swamping the machine for tens of
>>> minutes and sometimes being unkillable. Which is why JTREG had problems
>>> cleaning up after the test...
>>
>> Ok. I can see why this is product then.   It would be good to bound
>> these OOM tests somehow.   This is an interesting solution.
>
> Ron thought you would like it. Was it you or Karen that asked for
> the temporary malloc ceiling from Ron's last fix to become a
> permanent addition? I'll have to check my archive...
>
>
>>>
>>>> and the code 589-600 was a function check_malloc_bytes() and 604-612
>>>> was add_malloc_bytes().
>>>
>>> Lastly, you want to change the in-line diagnostic code into a function
>>> call and you're worried about slowing things down?
>>
>> Clean code is worth a call instruction.
>
> You can't argue both sides of the street. :-)
>
> You were complaining about the new atomic ops above that were
> guarded by a flag check so later asking for a call instruction
> is way more expensive.
>
>
>>>
>>>
>>>> It's hard to see the ::malloc() there now!
>>> It was hard to the ::malloc() in there before and yes, this
>>> fix makes it worse. However, how often is anyone in this code
>>> besides Zhengyu and now Ron?
>>
>> So the excuse for messy code is that not many people see it?
>
> Messy code? Really, do you really want to go down that rat hole
> in this particular file? I'm not at all sure why you're going
> after this change so hard, but it is definitely interesting to
> observe.
>
> I'll let Ron jump in on this thread tomorrow. This is his code
> so he gets to make the call. He might tell me I'm all wet and
> agree with everything you say. Who knows...
>
> Dan
>
>
>>
>> Thanks,
>> Coleen
>>>
>>> Dan
>>>
>>>
>>>>
>>>> Coleen
>>>>
>>>> On 3/12/2013 8:22 PM, Ron Durbin wrote:
>>>>> David
>>>>>
>>>>> Dan has provided a link. So you can access the webrev and review
>>>>> the changes.
>>>>>
>>>>> Ron
>>>>>
>>>>> -----Original Message-----
>>>>> From: Daniel D. Daugherty
>>>>> Sent: Tuesday, March 12, 2013 8:43 AM
>>>>> To: hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: Code Review request for new MallocMaxBytes option and
>>>>> fixed test (7030610, 7037122, 7123945)
>>>>>
>>>>> David,
>>>>>
>>>>> Thanks for spotting the URL hiccup. I missed it during my review.
>>>>>
>>>>> There is something wrong with Ron's access to cr.openjdk.java.net
>>>>> so I'm going to host his webrev for this review:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/for_rdurbin/7030610-webrev/0-hsx25/
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 3/11/13 8:54 PM, David Holmes wrote:
>>>>>> Ron,
>>>>>>
>>>>>> On 12/03/2013 10:01 AM, Ron Durbin wrote:
>>>>>>> All
>>>>>>>
>>>>>>> I have a code review readyfor:
>>>>>>>
>>>>>>> WEBREV http://javaweb.us.oracle.com/~rdurbin/7030610-webrev/0-hsx25/
>>>>>> Please make available an open webrev on cr.openjdk.java.net
>>>>>>
>>>>>> Thank you.
>>>>>> David
>>>>>>
>>>>>>> 7030610 runtime/6878713/Test6878713.sh fails, Error. failed to clean
>>>>>>> up files after test.
>>>>>>>
>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7030610
>>>>>>>
>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-7030610
>>>>>>>
>>>>>>> 7037122 runtime/6878713/Test6878713.sh is written incorrectly
>>>>>>>
>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7037122
>>>>>>>
>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-7037122
>>>>>>>
>>>>>>> 7123945 runtime/6878713/Test6878713.sh require about 2G of
>>>>>>>
>>>>>>> _http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7123945_
>>>>>>>
>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-7123945
>>>>>>>
>>>>>>> Summary: Add new diagnostic option -XX:MallocMaxBytes=NNN and fix
>>>>>>>
>>>>>>> Test6878713.sh. Test6878713.sh is a security regression test with
>>>>>>> four different failure modes.Three aredocumented in the defects
>>>>>>> above
>>>>>>> and the forth
>>>>>>>
>>>>>>> was found duringbug triage. The root cause of these failures can be
>>>>>>>
>>>>>>> traced back to two issues:Inconsistent test conditionsetup and error
>>>>>>> handling.
>>>>>>>
>>>>>>> All four defect manifestations are fixed by this set of changes.
>>>>>>>
>>>>>>> New code built and test executed via JPRT test job; Linux ARM and
>>>>>>> Linux PPC version built, but not tested because ARM and PPC are
>>>>>>> currently build-only platforms in JPRT.
>>>>>>>
>>>>>>> As always, comments and questions are welcome.
>>>>>>>
>>>>>>> Ron
>>>>>>>
>>>>>>
>>>>
>>>
>>
>