RFR(xxxs) (jdk10): 8168542: os::realloc should return a valid pointer for input size=0

David Holmes david.holmes at oracle.com
Wed Mar 1 04:11:25 UTC 2017


Hi Thomas,

Thanks for all the work you have done on this.

I'm disappointed that the assertions on size==0 yielded so many problem 
areas. I think in many cases these do indicate actual bugs in the logic 
- eg I'm not sure what a GrowableArray of size 0 is supposed to 
represent! But not sure it is worth spending the cycles trying to 
address this.

BTW the handling of G1ConcRefinementThreads=0 is definitely a bug in the 
argument processing logic. 0 means to set ergonomically, so it should 
either be handled thusly if explicitly set or else explicitly disallowed 
- the code currently passes through the 0 as the number of threads!

I do not have an issue with free(NULL) as that has always been 
well-defined and avoids checking the pointer twice when not-NULL. I 
prefer to see:

free(p);

rather than:

if (p) free(p);

So ... I'll concede to go with the first patch.

Thanks,
David
-----

On 27/02/2017 9:20 PM, Thomas Stüfe wrote:
> Hi all,
>
> thanks for all your input, and sorry for the delay!
>
> So, realloc. To summarize our discussion:
>
> There seems to be a slight preference toward treating allocations with
> size==0 as an error which should yield an assert. Both Chris and me see
> the size=1 approach as a pragmatic way to deal with this situation. But
> I can see both sides, so as a test I implemented another approach:
>
> http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_realloc_size_0/jdk10-webrev.02-alternate-version/webrev/
> <http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_realloc_size_0/jdk10-webrev.02-alternate-version/webrev/>
>
> Now this is a very trivial change. I assert on size=0 for both malloc
> and realloc. I decided to be more tolerant on the release build and
> return a 1byte sized block (yes, it is actually more) to the caller.
> Returning NULL is a bad idea, because no-one checks the errno, everyone
> just assumes OOM.
>
> I built slowdebug and ran it through hotspot jtreg tests on my Linux
> box. It yielded ~80 asserts (for malloc(size=0), no assert for
> realloc(size=0))
>
> They boil down to ~10 issues, all of which a variation of the same
> theme. Coding wants to do something with a collection of n items -
> allocates an array of size n for temporary data, iterates over the
> items, then frees the array again. If n=0, code degenerates to one
> malloc, followed by free. Note that the fact that n is 0 is often a
> result of the test itself testing corner cases.
>
> I honestly do not consider these occurrences errors. Yes, the
> malloc()/free() could have been skipped, but on the other hand the
> programmers may have relied knowingly on the established behavior of
> os::malloc() returning a freeable pointer for size=0. I attempted to
> "fix" some of the occurrences by skipping the malloc() if n was zero,
> but in all cases the coding ended up less readable than before. (Sorry,
> I cannot post this fix attempt, I accidentally deleted my patch).
>
> Here are some examples of asserts:
>
> --
>
> 1) 33 V  [libjvm.so+0xb17625]
>  G1VerifyYoungCSetIndicesClosure::G1VerifyYoungCSetIndicesClosure(unsigned
> long)+0xbb
>  34 V  [libjvm.so+0xb16588]
>  G1CollectionSet::verify_young_cset_indices() const+0x1a8
>
> Happens in a lot of tests. G1CollectionSet::verify_young_cset_indices()
> is called when G1CollectionSet::_collection_set_cur_length is zero. I
> assume this is fine, verification could just be skipped at this point,
> so the fix would be trivial.
>
> --
>
> gc/arguments/TestG1ConcRefinementThreads: VM is called with
> -XX:G1ConcRefinementThreads=0. Then, we assert here:
>
>  32 V  [libjvm.so+0x5adeee]  AllocateHeap(unsigned long, MemoryType,
> NativeCallStack const&,
> AllocFa-XX:G1ConcRefinementThreads=0ilStrategy::AllocFailEnum)+0x3b
>  33 V  [libjvm.so+0x954689]
>  ConcurrentG1Refine::create(CardTableEntryClosure*, int*)+0x197
>  34 V  [libjvm.so+0xaf596a]  G1CollectedHeap::initialize()+0x1a4
>  35 V  [libjvm.so+0x127e940]  Universe::initialize_heap()+0x62
>
> Not sure at which layer one would handle this. If
> -XX:G1ConcRefinementThreads=0 makes no sense as an option, should this
> setting not just be forbidden?
>
> --
> At a number of tests linux numa initialization failed:
>
>  32 V  [libjvm.so+0x5adeee]  AllocateHeap(unsigned long, MemoryType,
> NativeCallStack const&, AllocFailStrategy::AllocFailEnum)+0x3b
>  33 V  [libjvm.so+0xbf7aff]
>  GenericGrowableArray::raw_allocate(int)+0x12f
>
>  34 V  [libjvm.so+0x6b6b89]  GrowableArray<int>::GrowableArray(int,
> bool, MemoryType)+0x65
>  35 V  [libjvm.so+0x105b294]  os::Linux::libnuma_init()+0x17e
>
> Here, a GrowableArray is initialized with size = 0. This seems to be an
> allowed pattern, so GrowableArray would have to be fixed to delay the
> malloc until the first real allocation.
>
> --
>
> So, options:
>
> - we go this way - all the occurrences would have to be fixed, and
> programmers would have to be made aware of the changed behavior of
> os::malloc(size=0).
> - we could only assert for realloc(size=0), but I find this even more
> inconsistent than it is now.
> - we could rethink the original decision.
>
> More things to ponder:
>
> - what about free(NULL), do we want to assert here too? On one hand,
> this may indicate an error, possibly a double free(). On the other hand,
> passing NULL to free() is even more a known established pattern than
> passing size=0 to malloc, and programmers may just do this expecting
> os::free() to have the same semantics as ::free().
>
> - this patch relies on the fact that we have control about who calls
> os::malloc/os::free (similar to the decision to use malloc headers in
> both NMT and os.cpp). Should we ever open up our malloc/free calls to
> the outside world, we may loose this control and would have to make sure
> that os::malloc/os::free behave like ::malloc/::free. Why would we open
> os::malloc to the outside? For instance, to get NMT malloc coverage over
> the JDK native libraries. Granted, a lot more would have to change to
> enable this. (Note that we have a malloc statistic in place in our fork
> which spans hotspot + native jdk, and we do this by calling os::malloc
> from the jdk native code).
>
>
> Kind Regards, Thomas
>
>
>
>
> On Wed, Feb 8, 2017 at 10:01 AM, Robbin Ehn <robbin.ehn at oracle.com
> <mailto:robbin.ehn at oracle.com>> wrote:
>
>     Hi,
>
>     On 02/07/2017 02:24 AM, Chris Plummer wrote:
>
>         I don't think we can push a change to assert when size == 0
>         without first thoroughly testing it and fixing any places that
>         currently do that. Testing may turn up nothing,
>         or it may turn up a rat's nest. Just something to think about
>         before starting down this path.
>
>
>     Since there is a while since jdk10 will be GA, now is the best time.
>
>
>         I'm not as offended by the size 1 approach as others, and it is
>         what we already do for malloc(). It probably  makes sense to
>         have malloc and realloc be consistent in this
>         regard, regardless of what the native version do by default
>         (which can vary). Just given the fact that we are even having
>         this discussion and debating which is best tends
>         to make me think that caller's of malloc() and realloc() either
>         didn't even give this topic any thought, or possibly came to
>         differing conclusions on whether or not it
>         mattered if they passed in a size == 0.
>
>
>     I agree that realloc and malloc should do the same.
>
>     IEEE Std 1003.1, 2004 Edition says:
>     "If size is 0 and ptr is not a null pointer, the object pointed to
>     is freed."
>
>     So as I said in previously mail, there is nothing wrong with doing
>     realloc(ptr, 0), but I also don't think we should do it hotspot.
>     Doing malloc(0) is undefined, so this really should be at least an
>     assert.
>
>     I'm all favor assert for both realloc and malloc.
>
>     I can do pre-testing, KS, jtreg, tonga or whatever we think is
>     appropriate.
>     (if this is a rat's nest we can change our minds...)
>
>     /Robbin
>
>
>
>
>         Chris
>
>         On 2/6/17 3:49 AM, Dmitry Samersoff wrote:
>
>             Thomas,
>
>             I share all David's concerns.
>
>             Both Linux and BSD return NULL if size for realloc is zero.
>
>             IMHO, if we call realloc with size == 0 it means a program
>             error on
>             higher level and we should fix it rather than hide.
>
>             So it might be better to assert size > 0 inside realloc and
>             return NULL.
>
>             -Dmitry
>
>
>             On 2017-02-05 13:56, Thomas Stüfe wrote:
>
>                 Hi David,
>
>                 some context: the whole issue came into being because
>                 for a long time we
>                 had exchanged os::malloc/os::realloc with our own
>                 versions for malloc
>                 tracking purposes. This was way before NMT.
>
>                 When writing the os::malloc/realloc replacements and
>                 associated tests, we
>                 took care to mimick the behavior of native
>                 malloc/realloc in a consistent
>                 matter. That way we could exchange malloc calls with my
>                 os::malloc on a
>                 case by case base without behavior change (or put it
>                 differently, nothing
>                 would break if I accidentally miss an instrumentation in
>                 code and leave it
>                 a native malloc/realloc)
>
>                 Now we went back to the OpenJDK version of
>                 os::malloc()/os::realloc() to
>                 reduce maintenance, and my cornercase tests fail. Hence
>                 this bug report.
>
>                 Please find further answers inline.
>
>                 On Sun, Feb 5, 2017 at 3:50 AM, David Holmes
>                 <david.holmes at oracle.com <mailto:david.holmes at oracle.com>>
>                 wrote:
>
>                     Hi Thomas,
>
>                     On 4/02/2017 10:09 PM, Thomas Stüfe wrote:
>
>                         Hi guys,
>
>                         picking up this trivial issue which got pushed
>                         to jdk10, could I please
>                         have a re-review? Oh, also a sponsor.
>
>                         Issue:
>                         https://bugs.openjdk.java.net/browse/JDK-8168542
>                         <https://bugs.openjdk.java.net/browse/JDK-8168542>
>
>                         webrev:
>                         http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_reallo
>                         <http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_reallo>
>                         c_size_0/jdk10-webrev.01/webrev/
>
>                         For reference, the old discussion back in
>                         october 16:
>                         http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2
>                         <http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2>
>                         016-October/021675.html
>
>                     I am still rather confused by both existing code and
>                     proposed change - the
>                     code just seems wrong if passed in a zero size.
>                     Passing in zero is a bug
>                     AFAICS and if buggy code passes in zero then how do
>                     we know what it will do
>                     with the returned value from os::realloc ?
>
>
>
>                 Currently, in the OpenJDK version, we have the following
>                 behavior:
>
>                 1 ASSERT
>                   a os::malloc
>                 size==0: return valid pointer to 1 byte array
>                   b os::realloc
>                 ptr != NULL && size==0: return NULL
>
>                 2 !ASSERT
>                   a      os::malloc
>                 size==0: return valid pointer to 1 byte array
>                   b      os::realloc
>                 ptr != NULL && size==0: whatever the native realloc
>                 does, which may be
>                 returning NULL or returning a valid allocation.
>
>                 This is inconsistent.
>
>                 A) 1a and 1b, although arguably similar cases, behave
>                 differently.
>                 B) 1b and 2b may behave differently, depending on the
>                 behaviour of the
>                 native CRT calls. Which is IMHO worse than case (A). You
>                 do not want
>                 different behaviour between ASSERT and !ASSERT.
>
>                 Additionally, for (B) it is not good that we delegate
>                 cornercase behavior
>                 to the native CRT, because that may lead to
>                 os::realloc() to have different
>                 behavior between platforms. On one platform,
>                 ::realloc(ptr,0) may return
>                 NULL, on another it may return a pointer. So you get
>                 platform dependent
>                 errors.
>
>                 My patch proposes to change the behavior like this:
>
>                 1 ASSERT
>                   a os::malloc
>                 size==0: return valid pointer to 1 byte array
>                   b os::realloc
>                 ptr != NULL && size==0: return valid pointer to 1 byte array
>
>                 2 !ASSERT
>                   a      os::malloc
>                 size==0: return valid pointer to 1 byte array
>                   b      os::realloc
>                 ptr != NULL && size==0: return valid pointer to 1 byte array
>
>                 This is more consistent.
>
>                 -----------
>
>                 Aside from the consistency issue:
>
>                 I think if someone passes 0 as resize size, this may
>                 mean he calculated the
>                 resize size wrong. Note that this does not necessarily
>                 mean a fatal error,
>                 if he has the correct notion of buffer size - 0 - and
>                 does not overwrite
>                 anything, nothing bad will happen.
>
>                 In this situation you have three choices
>
>                 1) return NULL. Caller will either incorrectly access
>                 the pointer and
>                 crash, or test the pointer and all cases I saw then
>                 assume OOM. Also bad.
>
>                 2) assert. This just seems plain wrong. Inconsistent
>                 with CRT behavior, and
>                 what do you do in release code?
>
>                 3) return a 1 byte allocation. That is what I propose.
>                 This may in the
>                 worst case lead to memory leaks; but only if caller
>                 expected the behaviour
>                 of os::realloc(ptr, 0) to be "free ptr and return NULL",
>                 which I have seen
>                 nowhere. If caller is oblivious to giving us size 0, he
>                 will be freeing the
>                 memory later.
>
>                 (Note that you could easily prevent memory leaks by just
>                 returning a
>                 special global static sentinel pointer for size==0, but
>                 you have to be sure
>                 that all os::realloc calls are matched by os::free calls.)
>
>                 I see your point about treating size==0 as an error. But
>                 I am not a big
>                 fan, for the mentioned reasons, and would prefer
>                 os::realloc() to behave
>                 like native realloc.
>
>                 But if we follow your suggestion and make size==0 an
>                 assertable offense, we
>                 should do so too for malloc(0).
>
>                 Kind Regards, Thomas
>
>
>
>                 David
>
>
>                     The webrev is unchanged from the proposal for jdk9,
>                     just rebased to
>
>                         jdk10/hs.
>
>                         @Chris, I decided to not follow your suggestion
>                         to move the comparison
>                         into the #ifndef assert. You are right, this is
>                         redundant with the check
>                         inside os::malloc(), but as you said, checking
>                         at the entry of
>                         os::realloc() makes it clearer.
>
>                         Thank you!
>
>                         Kind Regards, Thomas
>
>
>
>
>
>


More information about the hotspot-runtime-dev mailing list