RFR[JDK11]: 8204691: HeapRegion.apply_to_marked_objects_other_vm_test fails with assert(!hr->is_free() || hr->is_empty()) failed: Free region 0 is not empty for set Free list #

Mon Jul 9 19:51:15 UTC 2018

> On Jul 9, 2018, at 11:49 AM, Erik Helin <erik.helin at oracle.com> wrote:
> 
> On 07/08/2018 05:52 PM, Kim Barrett wrote:
>>> On Jul 5, 2018, at 4:03 PM, Kim Barrett <kim.barrett at oracle.com> wrote:
>>> 
>>>> On Jul 5, 2018, at 3:57 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> On Wed, 2018-07-04 at 20:13 -0400, Kim Barrett wrote:
>>>>> Please review this fix of the HeapRegion gtest.
>>>>> 
>>>>> The test modifies a region's "top" to unexpected values without
>>>>> ensuring that no allocation might use the region and no GC might run
>>>>> while the region is in that invalid state.  We solve this by
>>>>> executing the test code in its very own safepoint, and by saving and
>>>>> then restoring the region's top back to its original value before
>>>>> completing the test.  And since we are doing all that, there's no
>>>>> longer any reason to run the test in a separate VM.
>>>> 
>>>> looks good, but the actual test is still run in a separate VM.
>>>> Intentional?
>>> 
>>> Unintentional. And now I’m not sure what I last ran through mach5.
>>> I’ll re-test with TEST_OTHER_VM => TEST_VM.
>>> 
>>> I know that failed in an obscure way earlier, but I think that was because
>>> of an unrelated recently introduced bug that’s been fixed in the repo.
>> Verified that I really have tested in same VM.  New webrev:
>> http://cr.openjdk.java.net/~kbarrett/8204691/open.01/
>> The only change is TEST_OTHER_VM => TEST_VM.
> 
> Hmmm, it is (very) unfortunate if we have native code in JVM allocating Java objects and triggering garbage collections _concurrently_ with the unit tests being run (there shouldn't be any Java code running when the unit tests are executing). I understand that we have to restore the top pointer in case there is some verification for example when the JVM exits (or if we assert in a destructor etc), but do we really need to run the test in a safepoint? There is nothing wrong with running the test in a safepoint, but it seems to me that we then would have to run almost all TEST_VM tests in a safepoint?
> 
> Thanks,
> Erik

I don't think it's quite *that* bad.

As far as I can tell, TEST_VM tests have always been executed
concurrently with the executing VM.  That is, the VM is created (by
calling JNI_CreateJavaVM), and then the same thread that made that
call (which is now the main thread for the VM) executes the TEST_VM
tests.  That thread is a Java thread, initially "in native" (which is
why we need to do the ThreadInVMfromNative transition first, before
going to the safepoint).

It's only a problem for tests that mess with VM data structures in
unexpected ways.  I guess many / most / nearly all(?) don't do that,
since we haven't seen more of these kinds of failures.  But I agree
that it does mean one needs to take some additional care when writing
TEST_VM tests.

And in fact, there are some tests that rely on that behavior, e.g.
the test of OopStorage::delete_empty_blocks_concurrent().

This particular test is doing something really nasty behind the
collector's back.  It was trying to protect against that by using
TEST_OTHER_VM, but that just narrowed the window for failures.

I looked at the 3 other uses of TEST_OTHER_VM, and none of them appear
to have this kind of problem.  They are run in another VM because they
side-effect the VM in a way that we don't necessarily want to apply
the to main test runner.  But they don't seem to be bashing on VM data
structures in non-approved ways.