(S) 8176518: [9] C2: Invalid ImplicitNullChecks with non-protected heap base

Zoltán Majó zoltan.majo at oracle.com
Fri Mar 17 08:50:14 UTC 2017


Hi Martin,


On 03/16/2017 04:31 PM, Doerr, Martin wrote:
> Hi,
>
> thanks, Zoltan, for all your support.
>
> I have spent some more time for experiments on Windows.

thank you for the investigation!

> UseLargePages only works for users with special privilege "SeLockMemoryPrivilege" which is kind of tricky to get.
> The good news is that this bug is not problematic on Windows.
> The function gen_narrow_oop_implicit_null_checks returns false if narrow_oop_use_implicit_null_checks is false (except for AIX), so the null check gets performed after the decode. C2 uses regular decode (0 gets decoded to 0, unlike decode_not_null) in this case, so the implicit null check hits the 0 page which is not accessible and the exception gets thrown.
> I have tried changing gen_narrow_oop_implicit_null_checks and that allows reproducing the bug on Windows.
> With the current implementation of gen_narrow_oop_implicit_null_checks, only AIX is exposed to the bug.

Thank you for explaining. I updated the bug report to match your 
description.

Pre-integration testing is still in progress. I'll push the fix once I 
have the results.

Best regards,


Zoltan

>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Zoltán Majó [mailto:zoltan.majo at oracle.com]
> Sent: Mittwoch, 15. März 2017 15:40
> To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
> Subject: Re: (S) 8176518: [9] C2: Invalid ImplicitNullChecks with non-protected heap base
>
> Hi Martin,
>
>
> On 03/15/2017 03:27 PM, Doerr, Martin wrote:
>> Hi Zoltan,
>>
>> thank you very much for reviewing and for sponsoring.
> thank you for spending time on this issue.
>
>> I have updated the comments and added the test.
>> We think it should be fine to run the test on all 64 bit platforms as it is fast and a little more testing in this area is not bad.
>> If you don't agree, I can change it to run only on fewer platforms.
> OK, let's run it on all platforms then.
>
>> Please take a look.
>>
>> New webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8176518_C2_NullCheck/webrev.01/
> The updated webrev looks good to me. I'll start pre-integration testing
> now and (if nothing unexpected happens) push the change afterwards. I'll
> keep you updated.
>
> Thank you!
>
> Best regards,
>
>
> Zoltan
>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Zoltán Majó [mailto:zoltan.majo at oracle.com]
>> Sent: Mittwoch, 15. März 2017 11:07
>> To: Doerr, Martin <martin.doerr at sap.com>; 'hotspot-compiler-dev at openjdk.java.net' <hotspot-compiler-dev at openjdk.java.net>
>> Subject: Re: (S) 8176518: [9] C2: Invalid ImplicitNullChecks with non-protected heap base
>>
>> Hi Martin,
>>
>>
>> thank you for looking into this issue! Your fix looks good to me in
>> general. Here are a few minor comments that you could maybe consider.
>>
>> - Could you please update the comment on line 261 to describe that we
>> skip generating the implicit null check if the heap base is not protected?
>>
>>     260 !Universe::narrow_oop_use_implicit_null_checks()))
>>     261           continue;             // Give up if offset is beyond
>> page size
>>
>> - Could you please update the comment on line 279 to be more consistent
>> with other comments in the same code region. Something along the lines
>> "Give up if operation is a DecodeN and the heap base is not protected"
>> could do. I don't see anything wrong with the check of the graph's shape.
>>
>> - Could you please change comment
>>
>>     278           continue;             // Give up i*s* reference is
>> beyond 4K page size
>>
>> to
>>
>>     278           continue;             // Give up i*f* reference is
>> beyond 4K page size
>>
>> - Could you please add the test case you have to the fix you're
>> proposing? I'm not sure if you need to record the duration of the test's
>> execution. Maybe you can reduce the number of iterations -- 20K could be
>> sufficient. Could you please make the test throw an exception if the
>> NullPointerException is not thrown? And finally, can you force the test
>> to be executed only on AIX/Windows and with the parameters you mentioned
>> in the RFR (i.e., -XX:HeapBaseMinAddress=32g on AIX and
>> -XX:ObjectAlignmentInBytes=16 -XX:+UseLargePages on Windows)
>>
>> I can do the pre-integration testing for the change and then sponsor it.
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>> On 03/14/2017 02:35 PM, Doerr, Martin wrote:
>>> Hi,
>>>
>>> I have experimented on x86 and I get a very similar graph when using
>>> -XX:ObjectAlignmentInBytes=16 (prevents oop decoding matched into
>>> operands) in addition to the flags below under [1].
>>>
>>> If -XX:+UseLargePages (which is used to skip the heap base protection)
>>> is supported on Windows, the problem exists there as well.
>>>
>>> As the graph pattern on PPC64 is the same as on x86, I believe that
>>> there's nothing wrong except the missing check for
>>> narrow_oop_use_implicit_null_checks(), which is added by the webrev below.
>>>
>>> The remaining question is if there's a better test than
>>> "base->as_Mach()->ideal_Opcode() == Op_DecodeN".
>>>
>>> I will also need a sponsor for this change, please.
>>>
>>> Thanks and best regards,
>>>
>>> Martin
>>>
>>> *From:*hotspot-compiler-dev
>>> [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] *On Behalf Of
>>> *Doerr, Martin
>>> *Sent:* Freitag, 10. März 2017 18:49
>>> *To:* 'hotspot-compiler-dev at openjdk.java.net'
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> *Subject:* RFR: (S) 8176518: [9] C2: Invalid ImplicitNullChecks with
>>> non-protected heap base
>>>
>>> Hi,
>>>
>>> we have observed that C2 generates implicit null checks for write
>>> accesses to the heap base even though it was not protected
>>> (narrow_oop_use_implicit_null_checks = false).
>>>
>>> The problem showed up on AIX with -XX:HeapBaseMinAddress=32g.
>>> ReservedHeapSpace::establish_noaccess_prefix protects the base area on
>>> almost all platforms, but not on AIX (in some case).
>>>
>>> PhaseCFG::implicit_null_check needs to skip the transformation in this
>>> case.
>>>
>>> The problem can be reproduced by the simple test program below under
>>> [1]. The null pointer exception is just missing when running with the
>>> given parameters.
>>>
>>> The problem can be prevented by applying this patch:
>>>
>>> http://cr.openjdk.java.net/~mdoerr/8176518_C2_NullCheck/webrev.00/
>>> <http://cr.openjdk.java.net/%7Emdoerr/8176518_C2_NullCheck/webrev.00/>
>>>
>>> Especially, the case "base->is_Mach() &&
>>> base->as_Mach()->ideal_Opcode() == Op_DecodeN" gets hit. I'm not sure
>>> if this is allowed at that place. Maybe something went wrong before.
>>>
>>> The node "base" is a storeI and "addr"="base" is a decodeN node (the
>>> "index" retrieved by mach->memory_inputs is NULL).
>>>
>>> The patch helps, but is there a better way to fix the problem?
>>>
>>> Thanks and best regards,
>>>
>>> Martin
>>>
>>> [1] Reproduction case:
>>>
>>> /jdk/bin/java -XX:HeapBaseMinAddress=32g -XX:-TieredCompilation
>>> -Xbatch TestWriteNPE
>>>
>>> *public**class*TestWriteNPE{
>>>
>>> TestWriteNPE instance = *null*;
>>>
>>> *int*i = 0;
>>>
>>> *public**void*bogus_test(*int*value) {
>>>
>>> instance.i = value;
>>>
>>>       }
>>>
>>> *static**final**int*loop_cnt=100000;
>>>
>>> *public**static**void*main(String args[]){
>>>
>>> TestWriteNPE xyz=*new*TestWriteNPE();
>>>
>>> xyz.instance = xyz;
>>>
>>> *long*duration = System.nanoTime();
>>>
>>> *for*(*int*x=0; x<loop_cnt; x++) xyz.bogus_test(x);
>>>
>>> duration = System.nanoTime() - duration;
>>>
>>> System.out.println("value: "+ xyz.i + " (duration: "+
>>> duration/loop_cnt + ")");
>>>
>>> xyz.instance = *null*;
>>>
>>> *try*{
>>>
>>> xyz.bogus_test(0);
>>>
>>> } *catch*(NullPointerException np) {
>>>
>>> System.out.println("Got NPE:");
>>>
>>> np.printStackTrace();
>>>
>>>           }
>>>
>>>       }
>>>
>>> }
>>>



More information about the hotspot-compiler-dev mailing list