RFR (xs) 8075967: Zero interpreter asserts for SafeFetch<32, N> calls in ObjectMonitor

Thu Mar 26 18:29:59 UTC 2015

Hi Coleen,

No problem, I can fix this next time I touch that code. I am happy that we
all agree on a solution.

Kind regards, Thomas
On Mar 26, 2015 7:14 PM, "Coleen Phillimore" <coleen.phillimore at oracle.com>
wrote:

>
> On 3/26/15, 9:46 AM, Thomas Stüfe wrote:
>
> Hi David,
>
>  you are right, it was also broken before. I missed that Coleen just
> reintroduced an earlier version of the dummy SafeFetch() implementation,
> sorry for that.
>
>  I would be happy with a CanUseSafeFetch() which returns always false on
> zero and false on other platforms before stub generation. That would be
> always safe to call.
>
>
> Yes, I missed the case where you were using CanUseSafeFetch also to mean
> that the stub has been generated.  It should be something like this:
>
> inline bool CanUseSafeFetch32() {
>   return NOT_ZERO(StubRoutines::SafeFetch32_stub() != NULL)
> ZERO_ONLY(false);
> }
>
> I don't know if you want to fix this when you introduce calls to SafeFetch
> before initialization is complete or now.
>
> Sorry for the problems.
>
> Coleen
>
>
>  Kind Regards, Thomas
>
>
> On Thu, Mar 26, 2015 at 1:38 PM, David Holmes <david.holmes at oracle.com>
> wrote:
>
>> On 26/03/2015 10:24 PM, Thomas Stüfe wrote:
>>
>>> Hi David,
>>>
>>> On Thu, Mar 26, 2015 at 11:42 AM, David Holmes <david.holmes at oracle.com
>>>   <mailto:david.holmes at oracle.com>> wrote:
>>>
>>>     Hi Thomas,
>>>
>>>     On 26/03/2015 6:35 PM, Thomas Stüfe wrote:
>>>
>>>         Hi,
>>>
>>>         I agree with Goetz.
>>>
>>>         SafeFetch() was meant to be guarded with CanUseSafeFetch(), this
>>>         is part
>>>         of the contract, so using it without this guard is actually
>>>         incorrect
>>>         and would crash were the calling code ever called before Stub
>>>         routine
>>>         initialization.
>>>
>>>
>>>     That contract was already implicit - this API is for use after the
>>>     stubs have been generated. There's no reason to expand the use of
>>>     this API to situations before stubs are generated. So the assert
>>>     captures that and CanUseSafeFetch is really unnecessary.
>>>
>>>
>>> We had to introduce CanUseSafeFetch() exactly because we have situations
>>> where we may call SafeFetch before stub routine generation:
>>>
>>> 1) In os_aix.cpp, we use it to probe protected memory after doing
>>> mprotect(). This is a long story, but we had to add this workaround
>>> because on shortcomings in AIX's mprotect(). This code will get called
>>> during initialization before stub routines are set up.
>>> Note that right now, in the OpenJDK code base, the code still directly
>>> checks whether the stubroutine is NULL, because that mprotect workaround
>>> is older than 8074552. But I planned to change this to
>>> CanUseSafeFetch(). But with the new CanUseSafeFetch, this would just
>>> break AIX.
>>>
>>> 2) In SAPs code base, we also use it to harden error log writing (e.g.
>>> to make callstack dumping more reliable). This may can be invoked very
>>> early, if a crash happens very early, before stub routines are
>>> available. We planned to contribute those improvements piece by piece to
>>> the OpenJDK - my first change for 8074552 was actually supposed to
>>> prepare that. Now, however, this makes no sense, because using
>>> SafeFetch() to harden error log writing would actually be worse for
>>> error logs for early crashes.
>>>
>>> SafeFetch is a very basic function which could be useful in many more
>>> contexts, but only if you can really be sure it is safe. The contexts it
>>> is used in may be very low level where you cannot be sure at which point
>>> in VM initialization you are.
>>>
>>> When I introduced CanUseSafeFetch(), it was to safeguard the use of
>>> SafeFetch() for situations where SafeFetch was not safe to call. Exactly
>>> so that I would have not to think about "is this too early to use
>>> SafeFetch()?". This is broken now - instead of guarding me against using
>>> SafeFetch() in situations where it is not safe to use, it will
>>> assert(debug) or potentially lie(release), in which case SafeFetch
>>> itself may crash.
>>>
>>
>> Okay I see a case for adding a check for "can I use SafeFetch at this
>> point in time" (though if what you are fetching is potentially not-safe
>> then you are in trouble!).
>>
>>          So now
>>>         - CanUseSafeFetch really now means "SafeFetch will behave as
>>>         excpected"
>>>         - SafeFetch is softened to "will safely fetch content of address
>>>         unless on zero, there it may crash the VM".
>>>
>>>         I think this is more confusing than before. In objectMonitor,
>>>         the logic
>>>         we want is "use SafeFetch wherever possible but if not, just
>>>         access the
>>>         memory directly". This logic now is hidden behind a seemingly
>>>         incorrect
>>>         use of SafeFetch without CanUseSafeFetch, and I think Goetz
>>> proposal
>>>         would have been far more clearer.
>>>
>>>
>>>     No the problem is that SafeFetch doesn't work on ZERO - if you give
>>>     it a bad address it won't recover it will crash. The whole point is
>>>     for it not to crash. The objectMonitor code is saying "here is a
>>>     load that we know may be invalid, but if it is then we don't want to
>>>     crash, so we use SafeFetch".
>>>
>>>
>>> Yes! But the way it is coded, it does not. Because SafeFetch is now
>>> implemented on zero with a simple load, it is not safe at all. The code
>>> in objectMonitor.cpp appears to be safe, but it is not. And SafeFetch()
>>> promises to be safe, but it is not, and you would not know unless you
>>> know the zero implementation.
>>>
>>
>>  Sorry did I miss something - has Zero ever implemented SafeFetch as
>> other than a direct simple load? ie hasn't Zero always been broken wrt
>> SafeFetch and that is the basic problem ?
>>
>>  It seems to be a bit inconsequent. Either one is really sure the pointer
>>> is accessible - in which case why use SafeFetch? - or it may be invalid,
>>> even if this is unusual - in which case why is the resulting crash not a
>>> problem for zero?
>>> In other words, if the original programmer thought it worth to guard the
>>> memory access with SafeFetch(), why is this not a concern for zero?
>>>
>>
>>  It is - which is why Zero is broken.
>>
>>      The expectation is that SafeFetch actually is safe no matter what is
>>>     fetched.
>>>
>>>
>>> Yes, and that expectation is broken now.
>>>
>>
>>  When was it not broken?
>>
>>  Before, if you used SafeFetch() and guarded it with CanUseSafeFetch(),
>>> you could be sure to be safe and not to break anything - regardless when
>>> in VM life you used the function or on which platform.
>>>
>>
>>  No you only knew that a function called SafeFetch could be called - you
>> didn't know it was actually implemented to actually always do a "safe
>> fetch".
>>
>>  Both SafeFetch and CanUseSafeFetch made promises:
>>> - CanUseSafeFetch promised to tell me if it is too early to call
>>> SafeFetch. Of course, it itself could be called at all times, that was
>>> the whole point of CanUseSafeFetch.
>>> - SafeFetch promised to be able to access a potentially invalid pointer
>>> and not to crash - if CanUseSafeFetch returned true.
>>>
>>
>>  SafeFetch (if it exists) promises to be safe. The ZERO implementation
>> breaks that.
>>
>>  Both promises are broken now. CanUseSafeFetch is not safe to call if it
>>> is too early - ironically a condition it was designed to tell me.
>>>
>>
>>  Okay we will re-fix that.
>>
>>  SafeFetch is not safe to call because it lies about being safe for zero.
>>>
>>
>>  It never was safe to call as far as I can see, it was just never called
>> with an invalid value. The new test exposed this.
>>
>> David
>> -----
>>
>>  The result is that either if we continue to use SafeFetch, we risk
>>> crashes on zero and crashes if invoked too early. This makes the API
>>> useless for most cases.
>>>
>>> I am open to any other suggestions - renaming the APIs, maybe reshaping
>>> them - but I regret seeing SafeFetch being back to being quite unsafe
>>> again.
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>>     The ZERO implementation just does a load and hopes it isn't asked
>>>     for anything bad. The objectMonitor code will rarely actually
>>>     trigger a bad load. But the newly introduced test does.
>>>
>>>     So in my opinion to return to the status-quo where ZERO really
>>>     doesn't implement SafeFetch safely but it "works okay in practice",
>>>     we should disable those explicit tests when running on ZERO.
>>>
>>>     Whether CanUseSafeFetch (aka is SafeFetchStubAvailable) is
>>>     implemented explicitly or implicitly as an assert is a side-issue.
>>>
>>>     David
>>>
>>>         Kind Regards, Thomas
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>         On Thu, Mar 26, 2015 at 8:40 AM, Lindenmaier, Goetz
>>>         <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>
>>>          <mailto:goetz.lindenmaier at sap.__com
>>>         <mailto:goetz.lindenmaier at sap.com>>> wrote:
>>>
>>>              Hi,
>>>
>>>              sorry I only comment on this now ...
>>>              I think it's rather strange that you can not use SafeFetch
>>>         on zero,
>>>              but it's there,
>>>              and it's actually used ... which will be the situation
>>>         after this
>>>              change.
>>>
>>>              Why not just change objectMonitor as the SafeFetch api is
>>>         meant to
>>>              be used?
>>>              Obviously zero takes care that this access always succeeds:
>>>
>>>               diff -r a35854c84d9d
>>> src/share/vm/runtime/__objectMonitor.cpp
>>>              --- a/src/share/vm/runtime/__objectMonitor.cpp    Wed Mar
>>>         25 12:36:28
>>>              2015 +0100
>>>               +++ b/src/share/vm/runtime/__objectMonitor.cpp    Thu Mar
>>>
>>>         26 08:19:53
>>>              2015 +0100
>>>              @@ -2241,7 +2241,12 @@
>>>                  }
>>>
>>>                  assert(sizeof(((JavaThread *)ox)->_thread_state ==
>>>         sizeof(int)),
>>>              "invariant");
>>>              -  int jst = SafeFetch32((int *) &((JavaThread *)
>>>              ox)->_thread_state, -1);;
>>>              +  int jst;
>>>              +  if (CanUseSafeFetch32()) {
>>>              +    jst = SafeFetch32((int *) &((JavaThread *)
>>>         ox)->_thread_state, -1);
>>>              +  } else {
>>>              +    jst = ((JavaThread *)ox)->_thread_state;
>>>              +  }
>>>                  // consider also: jst != _thread_in_Java -- but that's
>>>         overspecific.
>>>                  return jst == _thread_blocked || jst ==
>>> _thread_in_native;
>>>                }
>>>
>>>              It's handled similarly in vmError.cpp.  And no #ifdefs ;)
>>>
>>>              But I'm also fine with the other change ... to get zero
>>>         working again.
>>>
>>>              Best regards,
>>>                 Goetz.
>>>
>>>
>>>
>>>              -----Original Message-----
>>>              From: hotspot-dev
>>>           [mailto:hotspot-dev-bounces at __openjdk.java.net
>>>         <mailto:hotspot-dev-bounces at openjdk.java.net>
>>>              <mailto:hotspot-dev-bounces at __openjdk.java.net
>>>         <mailto:hotspot-dev-bounces at openjdk.java.net>>] On Behalf Of
>>> Coleen
>>>              Phillimore
>>>              Sent: Thursday, March 26, 2015 3:30 AM
>>>              To: David Holmes; hotspot-dev developers
>>>              Subject: Re: RFR (xs) 8075967: Zero interpreter asserts for
>>>              SafeFetch<32, N> calls in ObjectMonitor
>>>
>>>
>>>              On 3/25/15, 10:15 PM, David Holmes wrote:
>>>               > On 26/03/2015 11:00 AM, Coleen Phillimore wrote:
>>>               >>
>>>               >> On 3/25/15, 8:21 PM, David Holmes wrote:
>>>               >>> On 26/03/2015 10:04 AM, Coleen Phillimore wrote:
>>>               >>>>
>>>               >>>> On 3/25/15, 7:59 PM, David Holmes wrote:
>>>               >>>>> Hi Coleen,
>>>               >>>>>
>>>               >>>>> Why generate stubs that can't be used and then add
>>>         zero-specific
>>>               >>>>> logic
>>>               >>>>> to the shared CanUseSafeFetchN instead of not
>>>         generating the
>>>              stubs so
>>>               >>>>> that CanUseSafeFetchN will return false anyway ??
>>>               >>>>
>>>               >>>> Because there is platform independent code in
>>>              objectMonitor.cpp that
>>>               >>>> uses SafeFetchX (both).  I'd rather not burden the
>>>         caller of
>>>              this with
>>>               >>>> testing CanUseSafeFetchX for each call.   This code
>>>         existed before
>>>               >>>> SafeFetch was implemented, so I'm restoring previous
>>>         behavior for
>>>               >>>> Zero.
>>>               >>>
>>>               >>> Sorry Coleen I thought this was deal to with 8075533,
>>>         I hadn't
>>>               >>> realized 8075533 broke the objectMonitor code and this
>>>         was a
>>>              follow up.
>>>               >>>
>>>               >>> What a mess. :(
>>>               >>
>>>               >> Yes, it is.  Is this a code Review?
>>>               >
>>>               > Let me work through this :) The original change for
>>> 8074552:
>>>               >
>>>               >
>>>
>>> http://hg.openjdk.java.net/__jdk9/hs-rt/hotspot/rev/__3eb61269f421
>>>         <http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/3eb61269f421>
>>>               >
>>>               > added the extra error handling stuff related to
>>>         SafeFetch and a new
>>>               > ErrorHandler test that was predicated on
>>>         CanUseSafeFetch32 which
>>>               > returns true if the stubs exist. It also added testing of
>>>              SafeFetch in
>>>               > StubRoutines::initialize2 which was excluded for
>>>         windows-32-bit and
>>>               > which did not check CanUseSafeFetch32.
>>>               >
>>>               > That change broke zero as reported in 8075533 because
>>>         you can't
>>>               > actually use the SafeFetch routines with an unsafe
>>>         address on Zero,
>>>               > and the solution:
>>>               >
>>>               >
>>>
>>> http://hg.openjdk.java.net/__jdk9/hs-rt/hotspot/rev/__5c2bc6278fc4
>>>
>>>         <http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/5c2bc6278fc4>
>>>               >
>>>               > was to remove the stubs so that CanUseSafeFetch32 would
>>>         return false,
>>>               > and to include a check of CanUseSafeFetch32 in the tests
>>>         used with
>>>               > StubRoutines::initialize2.
>>>               >
>>>               > And now we find that the SafeFetch routines are used
>>>         unconditionally
>>>               > by the objectMonitor code, so we need to restore the
>>>         stubs that were
>>>               > previously removed, but force CanUseSafeFetch32 to
>>>         return false
>>>              on zero.
>>>
>>>              Yes, exactly.
>>>               >
>>>               > Ok. Only nit is that:
>>>               >
>>>               > return NOT_ZERO(true) ZERO_ONLY(false);
>>>
>>>              Yes, you're right.  I'll make that change.
>>>
>>>              Thanks!
>>>              Coleen
>>>
>>>               >
>>>               > would be more readable than the ifdefs.
>>>               >
>>>               > Thanks,
>>>               > David
>>>               >
>>>               >> thanks!
>>>               >> Coleen
>>>               >>>
>>>               >>> Thanks,
>>>               >>> David
>>>               >>>
>>>               >>>
>>>               >>>
>>>               >>>> We could file an RFE to either implement SafeFetch
>>>         for Zero or
>>>              rewrite
>>>               >>>> this objectMonitor code to not need SafeFetch.  I
>>>         didn't want
>>>              to do
>>>               >>>> either of these things with this patch.
>>>               >>>>
>>>               >>>> Coleen
>>>               >>>>
>>>               >>>>>
>>>               >>>>> Thanks,
>>>               >>>>> David
>>>               >>>>>
>>>               >>>>> On 26/03/2015 3:53 AM, Coleen Phillimore wrote:
>>>               >>>>>> Summary: Implement SafeFetchX unsafely and make
>>>         CanUseSafeFetchX
>>>               >>>>>> false
>>>               >>>>>> for Zero
>>>               >>>>>>
>>>               >>>>>> Also, this fixes a couple other minor issues.
>>>               >>>>>>
>>>               >>>>>> Ran jdk9 jck tests with one timeout.
>>>         hotspot/runtime jtreg
>>>              tests
>>>               >>>>>> don't
>>>               >>>>>> run because Zero doesn't support UseCompressedOops
>>>         (not sure
>>>              why)
>>>               >>>>>> and
>>>               >>>>>> CDS (know why).
>>>               >>>>>>
>>>               >>>>>> open webrev at
>>>          http://cr.openjdk.java.net/~__coleenp/8075967/
>>>         <http://cr.openjdk.java.net/~coleenp/8075967/>
>>>               >>>>>> bug link
>>>         https://bugs.openjdk.java.net/__browse/JDK-8075967
>>>         <https://bugs.openjdk.java.net/browse/JDK-8075967>
>>>               >>>>>>
>>>               >>>>>> Thanks,
>>>               >>>>>> Coleen
>>>               >>>>
>>>               >>
>>>
>>>
>>>
>>>
>
>