RFR(xxxs) (jdk10): 8168542: os::realloc should return a valid pointer for input size=0

Thomas Stüfe thomas.stuefe at gmail.com
Mon Feb 27 11:20:26 UTC 2017


Hi all,

thanks for all your input, and sorry for the delay!

So, realloc. To summarize our discussion:

There seems to be a slight preference toward treating allocations with
size==0 as an error which should yield an assert. Both Chris and me see the
size=1 approach as a pragmatic way to deal with this situation. But I can
see both sides, so as a test I implemented another approach:

http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_
realloc_size_0/jdk10-webrev.02-alternate-version/webrev/

Now this is a very trivial change. I assert on size=0 for both malloc and
realloc. I decided to be more tolerant on the release build and return a
1byte sized block (yes, it is actually more) to the caller. Returning NULL
is a bad idea, because no-one checks the errno, everyone just assumes OOM.

I built slowdebug and ran it through hotspot jtreg tests on my Linux box.
It yielded ~80 asserts (for malloc(size=0), no assert for realloc(size=0))

They boil down to ~10 issues, all of which a variation of the same theme.
Coding wants to do something with a collection of n items - allocates an
array of size n for temporary data, iterates over the items, then frees the
array again. If n=0, code degenerates to one malloc, followed by free. Note
that the fact that n is 0 is often a result of the test itself testing
corner cases.

I honestly do not consider these occurrences errors. Yes, the
malloc()/free() could have been skipped, but on the other hand the
programmers may have relied knowingly on the established behavior of
os::malloc() returning a freeable pointer for size=0. I attempted to "fix"
some of the occurrences by skipping the malloc() if n was zero, but in all
cases the coding ended up less readable than before. (Sorry, I cannot post
this fix attempt, I accidentally deleted my patch).

Here are some examples of asserts:

--

1) 33 V  [libjvm.so+0xb17625]  G1VerifyYoungCSetIndicesClosure::
G1VerifyYoungCSetIndicesClosure(unsigned long)+0xbb
 34 V  [libjvm.so+0xb16588]  G1CollectionSet::verify_young_cset_indices()
const+0x1a8

Happens in a lot of tests. G1CollectionSet::verify_young_cset_indices() is
called when G1CollectionSet::_collection_set_cur_length is zero. I assume
this is fine, verification could just be skipped at this point, so the fix
would be trivial.

--

gc/arguments/TestG1ConcRefinementThreads: VM is called with
-XX:G1ConcRefinementThreads=0. Then, we assert here:

 32 V  [libjvm.so+0x5adeee]  AllocateHeap(unsigned long, MemoryType,
NativeCallStack const&, AllocFa-XX:G1ConcRefinementThreads=
0ilStrategy::AllocFailEnum)+0x3b
 33 V  [libjvm.so+0x954689]  ConcurrentG1Refine::create(CardTableEntryClosure*,
int*)+0x197
 34 V  [libjvm.so+0xaf596a]  G1CollectedHeap::initialize()+0x1a4
 35 V  [libjvm.so+0x127e940]  Universe::initialize_heap()+0x62

Not sure at which layer one would handle this. If
-XX:G1ConcRefinementThreads=0 makes no sense as an option, should this
setting not just be forbidden?

--
At a number of tests linux numa initialization failed:

 32 V  [libjvm.so+0x5adeee]  AllocateHeap(unsigned long, MemoryType,
NativeCallStack const&, AllocFailStrategy::AllocFailEnum)+0x3b
 33 V  [libjvm.so+0xbf7aff]  GenericGrowableArray::raw_allocate(int)+0x12f


 34 V  [libjvm.so+0x6b6b89]  GrowableArray<int>::GrowableArray(int, bool,
MemoryType)+0x65
 35 V  [libjvm.so+0x105b294]  os::Linux::libnuma_init()+0x17e

Here, a GrowableArray is initialized with size = 0. This seems to be an
allowed pattern, so GrowableArray would have to be fixed to delay the
malloc until the first real allocation.

--

So, options:

- we go this way - all the occurrences would have to be fixed, and
programmers would have to be made aware of the changed behavior of
os::malloc(size=0).
- we could only assert for realloc(size=0), but I find this even more
inconsistent than it is now.
- we could rethink the original decision.

More things to ponder:

- what about free(NULL), do we want to assert here too? On one hand, this
may indicate an error, possibly a double free(). On the other hand, passing
NULL to free() is even more a known established pattern than passing size=0
to malloc, and programmers may just do this expecting os::free() to have
the same semantics as ::free().

- this patch relies on the fact that we have control about who calls
os::malloc/os::free (similar to the decision to use malloc headers in both
NMT and os.cpp). Should we ever open up our malloc/free calls to the
outside world, we may loose this control and would have to make sure that
os::malloc/os::free behave like ::malloc/::free. Why would we open
os::malloc to the outside? For instance, to get NMT malloc coverage over
the JDK native libraries. Granted, a lot more would have to change to
enable this. (Note that we have a malloc statistic in place in our fork
which spans hotspot + native jdk, and we do this by calling os::malloc from
the jdk native code).


Kind Regards, Thomas




On Wed, Feb 8, 2017 at 10:01 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:

> Hi,
>
> On 02/07/2017 02:24 AM, Chris Plummer wrote:
>
>> I don't think we can push a change to assert when size == 0 without first
>> thoroughly testing it and fixing any places that currently do that. Testing
>> may turn up nothing,
>> or it may turn up a rat's nest. Just something to think about before
>> starting down this path.
>>
>
> Since there is a while since jdk10 will be GA, now is the best time.
>
>
>> I'm not as offended by the size 1 approach as others, and it is what we
>> already do for malloc(). It probably  makes sense to have malloc and
>> realloc be consistent in this
>> regard, regardless of what the native version do by default (which can
>> vary). Just given the fact that we are even having this discussion and
>> debating which is best tends
>> to make me think that caller's of malloc() and realloc() either didn't
>> even give this topic any thought, or possibly came to differing conclusions
>> on whether or not it
>> mattered if they passed in a size == 0.
>>
>
> I agree that realloc and malloc should do the same.
>
> IEEE Std 1003.1, 2004 Edition says:
> "If size is 0 and ptr is not a null pointer, the object pointed to is
> freed."
>
> So as I said in previously mail, there is nothing wrong with doing
> realloc(ptr, 0), but I also don't think we should do it hotspot.
> Doing malloc(0) is undefined, so this really should be at least an assert.
>
> I'm all favor assert for both realloc and malloc.
>
> I can do pre-testing, KS, jtreg, tonga or whatever we think is appropriate.
> (if this is a rat's nest we can change our minds...)
>
> /Robbin
>
>
>
>
>> Chris
>>
>> On 2/6/17 3:49 AM, Dmitry Samersoff wrote:
>>
>>> Thomas,
>>>
>>> I share all David's concerns.
>>>
>>> Both Linux and BSD return NULL if size for realloc is zero.
>>>
>>> IMHO, if we call realloc with size == 0 it means a program error on
>>> higher level and we should fix it rather than hide.
>>>
>>> So it might be better to assert size > 0 inside realloc and return NULL.
>>>
>>> -Dmitry
>>>
>>>
>>> On 2017-02-05 13:56, Thomas Stüfe wrote:
>>>
>>>> Hi David,
>>>>
>>>> some context: the whole issue came into being because for a long time we
>>>> had exchanged os::malloc/os::realloc with our own versions for malloc
>>>> tracking purposes. This was way before NMT.
>>>>
>>>> When writing the os::malloc/realloc replacements and associated tests,
>>>> we
>>>> took care to mimick the behavior of native malloc/realloc in a
>>>> consistent
>>>> matter. That way we could exchange malloc calls with my os::malloc on a
>>>> case by case base without behavior change (or put it differently,
>>>> nothing
>>>> would break if I accidentally miss an instrumentation in code and leave
>>>> it
>>>> a native malloc/realloc)
>>>>
>>>> Now we went back to the OpenJDK version of os::malloc()/os::realloc() to
>>>> reduce maintenance, and my cornercase tests fail. Hence this bug report.
>>>>
>>>> Please find further answers inline.
>>>>
>>>> On Sun, Feb 5, 2017 at 3:50 AM, David Holmes <david.holmes at oracle.com>
>>>> wrote:
>>>>
>>>> Hi Thomas,
>>>>>
>>>>> On 4/02/2017 10:09 PM, Thomas Stüfe wrote:
>>>>>
>>>>> Hi guys,
>>>>>>
>>>>>> picking up this trivial issue which got pushed to jdk10, could I
>>>>>> please
>>>>>> have a re-review? Oh, also a sponsor.
>>>>>>
>>>>>> Issue:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8168542
>>>>>>
>>>>>> webrev:
>>>>>> http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_reallo
>>>>>> c_size_0/jdk10-webrev.01/webrev/
>>>>>>
>>>>>> For reference, the old discussion back in october 16:
>>>>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2
>>>>>> 016-October/021675.html
>>>>>>
>>>>>> I am still rather confused by both existing code and proposed change
>>>>> - the
>>>>> code just seems wrong if passed in a zero size. Passing in zero is a
>>>>> bug
>>>>> AFAICS and if buggy code passes in zero then how do we know what it
>>>>> will do
>>>>> with the returned value from os::realloc ?
>>>>>
>>>>
>>>>
>>>> Currently, in the OpenJDK version, we have the following behavior:
>>>>
>>>> 1 ASSERT
>>>>   a os::malloc
>>>> size==0: return valid pointer to 1 byte array
>>>>   b os::realloc
>>>> ptr != NULL && size==0: return NULL
>>>>
>>>> 2 !ASSERT
>>>>   a      os::malloc
>>>> size==0: return valid pointer to 1 byte array
>>>>   b      os::realloc
>>>> ptr != NULL && size==0: whatever the native realloc does, which may be
>>>> returning NULL or returning a valid allocation.
>>>>
>>>> This is inconsistent.
>>>>
>>>> A) 1a and 1b, although arguably similar cases, behave differently.
>>>> B) 1b and 2b may behave differently, depending on the behaviour of the
>>>> native CRT calls. Which is IMHO worse than case (A). You do not want
>>>> different behaviour between ASSERT and !ASSERT.
>>>>
>>>> Additionally, for (B) it is not good that we delegate cornercase
>>>> behavior
>>>> to the native CRT, because that may lead to os::realloc() to have
>>>> different
>>>> behavior between platforms. On one platform, ::realloc(ptr,0) may return
>>>> NULL, on another it may return a pointer. So you get platform dependent
>>>> errors.
>>>>
>>>> My patch proposes to change the behavior like this:
>>>>
>>>> 1 ASSERT
>>>>   a os::malloc
>>>> size==0: return valid pointer to 1 byte array
>>>>   b os::realloc
>>>> ptr != NULL && size==0: return valid pointer to 1 byte array
>>>>
>>>> 2 !ASSERT
>>>>   a      os::malloc
>>>> size==0: return valid pointer to 1 byte array
>>>>   b      os::realloc
>>>> ptr != NULL && size==0: return valid pointer to 1 byte array
>>>>
>>>> This is more consistent.
>>>>
>>>> -----------
>>>>
>>>> Aside from the consistency issue:
>>>>
>>>> I think if someone passes 0 as resize size, this may mean he calculated
>>>> the
>>>> resize size wrong. Note that this does not necessarily mean a fatal
>>>> error,
>>>> if he has the correct notion of buffer size - 0 - and does not overwrite
>>>> anything, nothing bad will happen.
>>>>
>>>> In this situation you have three choices
>>>>
>>>> 1) return NULL. Caller will either incorrectly access the pointer and
>>>> crash, or test the pointer and all cases I saw then assume OOM. Also
>>>> bad.
>>>>
>>>> 2) assert. This just seems plain wrong. Inconsistent with CRT behavior,
>>>> and
>>>> what do you do in release code?
>>>>
>>>> 3) return a 1 byte allocation. That is what I propose. This may in the
>>>> worst case lead to memory leaks; but only if caller expected the
>>>> behaviour
>>>> of os::realloc(ptr, 0) to be "free ptr and return NULL", which I have
>>>> seen
>>>> nowhere. If caller is oblivious to giving us size 0, he will be freeing
>>>> the
>>>> memory later.
>>>>
>>>> (Note that you could easily prevent memory leaks by just returning a
>>>> special global static sentinel pointer for size==0, but you have to be
>>>> sure
>>>> that all os::realloc calls are matched by os::free calls.)
>>>>
>>>> I see your point about treating size==0 as an error. But I am not a big
>>>> fan, for the mentioned reasons, and would prefer os::realloc() to behave
>>>> like native realloc.
>>>>
>>>> But if we follow your suggestion and make size==0 an assertable
>>>> offense, we
>>>> should do so too for malloc(0).
>>>>
>>>> Kind Regards, Thomas
>>>>
>>>>
>>>>
>>>> David
>>>>
>>>>>
>>>>> The webrev is unchanged from the proposal for jdk9, just rebased to
>>>>>
>>>>>> jdk10/hs.
>>>>>>
>>>>>> @Chris, I decided to not follow your suggestion to move the comparison
>>>>>> into the #ifndef assert. You are right, this is redundant with the
>>>>>> check
>>>>>> inside os::malloc(), but as you said, checking at the entry of
>>>>>> os::realloc() makes it clearer.
>>>>>>
>>>>>> Thank you!
>>>>>>
>>>>>> Kind Regards, Thomas
>>>>>>
>>>>>>
>>>>>>
>>>
>>
>>


More information about the hotspot-runtime-dev mailing list