RFR (xs) 8075967: Zero interpreter asserts for SafeFetch<32, N> calls in ObjectMonitor

Thu Mar 26 12:38:06 UTC 2015

On 26/03/2015 10:24 PM, Thomas Stüfe wrote:
> Hi David,
>
> On Thu, Mar 26, 2015 at 11:42 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Hi Thomas,
>
>     On 26/03/2015 6:35 PM, Thomas Stüfe wrote:
>
>         Hi,
>
>         I agree with Goetz.
>
>         SafeFetch() was meant to be guarded with CanUseSafeFetch(), this
>         is part
>         of the contract, so using it without this guard is actually
>         incorrect
>         and would crash were the calling code ever called before Stub
>         routine
>         initialization.
>
>
>     That contract was already implicit - this API is for use after the
>     stubs have been generated. There's no reason to expand the use of
>     this API to situations before stubs are generated. So the assert
>     captures that and CanUseSafeFetch is really unnecessary.
>
>
> We had to introduce CanUseSafeFetch() exactly because we have situations
> where we may call SafeFetch before stub routine generation:
>
> 1) In os_aix.cpp, we use it to probe protected memory after doing
> mprotect(). This is a long story, but we had to add this workaround
> because on shortcomings in AIX's mprotect(). This code will get called
> during initialization before stub routines are set up.
> Note that right now, in the OpenJDK code base, the code still directly
> checks whether the stubroutine is NULL, because that mprotect workaround
> is older than 8074552. But I planned to change this to
> CanUseSafeFetch(). But with the new CanUseSafeFetch, this would just
> break AIX.
>
> 2) In SAPs code base, we also use it to harden error log writing (e.g.
> to make callstack dumping more reliable). This may can be invoked very
> early, if a crash happens very early, before stub routines are
> available. We planned to contribute those improvements piece by piece to
> the OpenJDK - my first change for 8074552 was actually supposed to
> prepare that. Now, however, this makes no sense, because using
> SafeFetch() to harden error log writing would actually be worse for
> error logs for early crashes.
>
> SafeFetch is a very basic function which could be useful in many more
> contexts, but only if you can really be sure it is safe. The contexts it
> is used in may be very low level where you cannot be sure at which point
> in VM initialization you are.
>
> When I introduced CanUseSafeFetch(), it was to safeguard the use of
> SafeFetch() for situations where SafeFetch was not safe to call. Exactly
> so that I would have not to think about "is this too early to use
> SafeFetch()?". This is broken now - instead of guarding me against using
> SafeFetch() in situations where it is not safe to use, it will
> assert(debug) or potentially lie(release), in which case SafeFetch
> itself may crash.

Okay I see a case for adding a check for "can I use SafeFetch at this 
point in time" (though if what you are fetching is potentially not-safe 
then you are in trouble!).

>         So now
>         - CanUseSafeFetch really now means "SafeFetch will behave as
>         excpected"
>         - SafeFetch is softened to "will safely fetch content of address
>         unless on zero, there it may crash the VM".
>
>         I think this is more confusing than before. In objectMonitor,
>         the logic
>         we want is "use SafeFetch wherever possible but if not, just
>         access the
>         memory directly". This logic now is hidden behind a seemingly
>         incorrect
>         use of SafeFetch without CanUseSafeFetch, and I think Goetz proposal
>         would have been far more clearer.
>
>
>     No the problem is that SafeFetch doesn't work on ZERO - if you give
>     it a bad address it won't recover it will crash. The whole point is
>     for it not to crash. The objectMonitor code is saying "here is a
>     load that we know may be invalid, but if it is then we don't want to
>     crash, so we use SafeFetch".
>
>
> Yes! But the way it is coded, it does not. Because SafeFetch is now
> implemented on zero with a simple load, it is not safe at all. The code
> in objectMonitor.cpp appears to be safe, but it is not. And SafeFetch()
> promises to be safe, but it is not, and you would not know unless you
> know the zero implementation.

Sorry did I miss something - has Zero ever implemented SafeFetch as 
other than a direct simple load? ie hasn't Zero always been broken wrt 
SafeFetch and that is the basic problem ?

> It seems to be a bit inconsequent. Either one is really sure the pointer
> is accessible - in which case why use SafeFetch? - or it may be invalid,
> even if this is unusual - in which case why is the resulting crash not a
> problem for zero?
> In other words, if the original programmer thought it worth to guard the
> memory access with SafeFetch(), why is this not a concern for zero?

It is - which is why Zero is broken.

>     The expectation is that SafeFetch actually is safe no matter what is
>     fetched.
>
>
> Yes, and that expectation is broken now.

When was it not broken?

> Before, if you used SafeFetch() and guarded it with CanUseSafeFetch(),
> you could be sure to be safe and not to break anything - regardless when
> in VM life you used the function or on which platform.

No you only knew that a function called SafeFetch could be called - you 
didn't know it was actually implemented to actually always do a "safe 
fetch".

> Both SafeFetch and CanUseSafeFetch made promises:
> - CanUseSafeFetch promised to tell me if it is too early to call
> SafeFetch. Of course, it itself could be called at all times, that was
> the whole point of CanUseSafeFetch.
> - SafeFetch promised to be able to access a potentially invalid pointer
> and not to crash - if CanUseSafeFetch returned true.

SafeFetch (if it exists) promises to be safe. The ZERO implementation 
breaks that.

> Both promises are broken now. CanUseSafeFetch is not safe to call if it
> is too early - ironically a condition it was designed to tell me.

Okay we will re-fix that.

> SafeFetch is not safe to call because it lies about being safe for zero.

It never was safe to call as far as I can see, it was just never called 
with an invalid value. The new test exposed this.

David
-----

> The result is that either if we continue to use SafeFetch, we risk
> crashes on zero and crashes if invoked too early. This makes the API
> useless for most cases.
>
> I am open to any other suggestions - renaming the APIs, maybe reshaping
> them - but I regret seeing SafeFetch being back to being quite unsafe again.
>
> Kind Regards, Thomas
>
>
>     The ZERO implementation just does a load and hopes it isn't asked
>     for anything bad. The objectMonitor code will rarely actually
>     trigger a bad load. But the newly introduced test does.
>
>     So in my opinion to return to the status-quo where ZERO really
>     doesn't implement SafeFetch safely but it "works okay in practice",
>     we should disable those explicit tests when running on ZERO.
>
>     Whether CanUseSafeFetch (aka is SafeFetchStubAvailable) is
>     implemented explicitly or implicitly as an assert is a side-issue.
>
>     David
>
>         Kind Regards, Thomas
>
>
>
>
>
>
>
>         On Thu, Mar 26, 2015 at 8:40 AM, Lindenmaier, Goetz
>         <goetz.lindenmaier at sap.com <mailto:goetz.lindenmaier at sap.com>
>         <mailto:goetz.lindenmaier at sap.__com
>         <mailto:goetz.lindenmaier at sap.com>>> wrote:
>
>              Hi,
>
>              sorry I only comment on this now ...
>              I think it's rather strange that you can not use SafeFetch
>         on zero,
>              but it's there,
>              and it's actually used ... which will be the situation
>         after this
>              change.
>
>              Why not just change objectMonitor as the SafeFetch api is
>         meant to
>              be used?
>              Obviously zero takes care that this access always succeeds:
>
>              diff -r a35854c84d9d src/share/vm/runtime/__objectMonitor.cpp
>              --- a/src/share/vm/runtime/__objectMonitor.cpp    Wed Mar
>         25 12:36:28
>              2015 +0100
>              +++ b/src/share/vm/runtime/__objectMonitor.cpp    Thu Mar
>         26 08:19:53
>              2015 +0100
>              @@ -2241,7 +2241,12 @@
>                  }
>
>                  assert(sizeof(((JavaThread *)ox)->_thread_state ==
>         sizeof(int)),
>              "invariant");
>              -  int jst = SafeFetch32((int *) &((JavaThread *)
>              ox)->_thread_state, -1);;
>              +  int jst;
>              +  if (CanUseSafeFetch32()) {
>              +    jst = SafeFetch32((int *) &((JavaThread *)
>         ox)->_thread_state, -1);
>              +  } else {
>              +    jst = ((JavaThread *)ox)->_thread_state;
>              +  }
>                  // consider also: jst != _thread_in_Java -- but that's
>         overspecific.
>                  return jst == _thread_blocked || jst == _thread_in_native;
>                }
>
>              It's handled similarly in vmError.cpp.  And no #ifdefs ;)
>
>              But I'm also fine with the other change ... to get zero
>         working again.
>
>              Best regards,
>                 Goetz.
>
>
>
>              -----Original Message-----
>              From: hotspot-dev
>         [mailto:hotspot-dev-bounces at __openjdk.java.net
>         <mailto:hotspot-dev-bounces at openjdk.java.net>
>              <mailto:hotspot-dev-bounces at __openjdk.java.net
>         <mailto:hotspot-dev-bounces at openjdk.java.net>>] On Behalf Of Coleen
>              Phillimore
>              Sent: Thursday, March 26, 2015 3:30 AM
>              To: David Holmes; hotspot-dev developers
>              Subject: Re: RFR (xs) 8075967: Zero interpreter asserts for
>              SafeFetch<32, N> calls in ObjectMonitor
>
>
>              On 3/25/15, 10:15 PM, David Holmes wrote:
>               > On 26/03/2015 11:00 AM, Coleen Phillimore wrote:
>               >>
>               >> On 3/25/15, 8:21 PM, David Holmes wrote:
>               >>> On 26/03/2015 10:04 AM, Coleen Phillimore wrote:
>               >>>>
>               >>>> On 3/25/15, 7:59 PM, David Holmes wrote:
>               >>>>> Hi Coleen,
>               >>>>>
>               >>>>> Why generate stubs that can't be used and then add
>         zero-specific
>               >>>>> logic
>               >>>>> to the shared CanUseSafeFetchN instead of not
>         generating the
>              stubs so
>               >>>>> that CanUseSafeFetchN will return false anyway ??
>               >>>>
>               >>>> Because there is platform independent code in
>              objectMonitor.cpp that
>               >>>> uses SafeFetchX (both).  I'd rather not burden the
>         caller of
>              this with
>               >>>> testing CanUseSafeFetchX for each call.   This code
>         existed before
>               >>>> SafeFetch was implemented, so I'm restoring previous
>         behavior for
>               >>>> Zero.
>               >>>
>               >>> Sorry Coleen I thought this was deal to with 8075533,
>         I hadn't
>               >>> realized 8075533 broke the objectMonitor code and this
>         was a
>              follow up.
>               >>>
>               >>> What a mess. :(
>               >>
>               >> Yes, it is.  Is this a code Review?
>               >
>               > Let me work through this :) The original change for 8074552:
>               >
>               >
>         http://hg.openjdk.java.net/__jdk9/hs-rt/hotspot/rev/__3eb61269f421
>         <http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/3eb61269f421>
>               >
>               > added the extra error handling stuff related to
>         SafeFetch and a new
>               > ErrorHandler test that was predicated on
>         CanUseSafeFetch32 which
>               > returns true if the stubs exist. It also added testing of
>              SafeFetch in
>               > StubRoutines::initialize2 which was excluded for
>         windows-32-bit and
>               > which did not check CanUseSafeFetch32.
>               >
>               > That change broke zero as reported in 8075533 because
>         you can't
>               > actually use the SafeFetch routines with an unsafe
>         address on Zero,
>               > and the solution:
>               >
>               >
>         http://hg.openjdk.java.net/__jdk9/hs-rt/hotspot/rev/__5c2bc6278fc4
>         <http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/5c2bc6278fc4>
>               >
>               > was to remove the stubs so that CanUseSafeFetch32 would
>         return false,
>               > and to include a check of CanUseSafeFetch32 in the tests
>         used with
>               > StubRoutines::initialize2.
>               >
>               > And now we find that the SafeFetch routines are used
>         unconditionally
>               > by the objectMonitor code, so we need to restore the
>         stubs that were
>               > previously removed, but force CanUseSafeFetch32 to
>         return false
>              on zero.
>
>              Yes, exactly.
>               >
>               > Ok. Only nit is that:
>               >
>               > return NOT_ZERO(true) ZERO_ONLY(false);
>
>              Yes, you're right.  I'll make that change.
>
>              Thanks!
>              Coleen
>
>               >
>               > would be more readable than the ifdefs.
>               >
>               > Thanks,
>               > David
>               >
>               >> thanks!
>               >> Coleen
>               >>>
>               >>> Thanks,
>               >>> David
>               >>>
>               >>>
>               >>>
>               >>>> We could file an RFE to either implement SafeFetch
>         for Zero or
>              rewrite
>               >>>> this objectMonitor code to not need SafeFetch.  I
>         didn't want
>              to do
>               >>>> either of these things with this patch.
>               >>>>
>               >>>> Coleen
>               >>>>
>               >>>>>
>               >>>>> Thanks,
>               >>>>> David
>               >>>>>
>               >>>>> On 26/03/2015 3:53 AM, Coleen Phillimore wrote:
>               >>>>>> Summary: Implement SafeFetchX unsafely and make
>         CanUseSafeFetchX
>               >>>>>> false
>               >>>>>> for Zero
>               >>>>>>
>               >>>>>> Also, this fixes a couple other minor issues.
>               >>>>>>
>               >>>>>> Ran jdk9 jck tests with one timeout.
>         hotspot/runtime jtreg
>              tests
>               >>>>>> don't
>               >>>>>> run because Zero doesn't support UseCompressedOops
>         (not sure
>              why)
>               >>>>>> and
>               >>>>>> CDS (know why).
>               >>>>>>
>               >>>>>> open webrev at
>         http://cr.openjdk.java.net/~__coleenp/8075967/
>         <http://cr.openjdk.java.net/~coleenp/8075967/>
>               >>>>>> bug link
>         https://bugs.openjdk.java.net/__browse/JDK-8075967
>         <https://bugs.openjdk.java.net/browse/JDK-8075967>
>               >>>>>>
>               >>>>>> Thanks,
>               >>>>>> Coleen
>               >>>>
>               >>
>
>
>