RFC: JEP JDK-8221828: New Invoke Bindings

Fri Oct 11 12:09:22 UTC 2019

On Fri, Oct 11, 2019 at 7:52 AM <erik.osterlund at oracle.com> wrote:

> Hi Vitaly,
>
> On 10/11/19 1:24 PM, Vitaly Davidovich wrote:
>
> Hi Erik,
>
> This sounds like a great idea! You touch on this in the JEP, but avoiding
> global safepoints for ICStub patching would be great.
>
> http://openjdk.5641.n7.nabble.com/InlineCacheBuffer-GuaranteedSafepointInterval-td229138.html
> is a few years old but I think largely still true today.
>
>
> Glad you brought this up. Yes. In fact, this is how this whole thing
> started for me. When I implemented concurrent class unloading for ZGC back
> in JDK 12, I found it quite annoying that our concurrent inline cache
> cleaning may trigger safepoints. I measured that after cleaning ~140 IC
> stubs (due to a rather pessimistic buffer sizing), we would run out of
> buffer space and safepoint. Since I work in the ZGC project, I get
> irrationally upset when I see things being done in safepoints that don't
> need it, and am willing to walk very far to get rid of them.
>
We need more Eriks! :)

> So I started looking into whether these IC stubs can be freed concurrently
> instead. I implemented a pipelined three-phased global handshaking scheme
> for safely reclaiming IC stubs and CompiledICHolders concurrently in the
> service thread, on architectures that support instruction cache coherency,
> which remedied the safepointing problem on x86_64. But not all
> architectures have instruction cache coherency, so I was annoyed by the
> limited portability of my solution, and maintaining multiple different life
> cycles of IC stubs, and making inline caches even more complicated than
> they already are. That's when I had finally had enough of inline cache
> problems, shelved that idea and decided to instead get rid of inline
> caches, as they no longer seem to solve a problem that is current, yet
> cause headache on a daily basis.
>
Impressive, and kudos for not walking away from this complexity.

>
>
> How confident are you that hardware’s branch target buffer (BTB) will
> neutralize the loss of direct jumps? In large applications, with lots of
> code and deep call graphs, I’d be concerned that BTB is exhausted due to
> sheer number of entries needed.
>
>
> I have run a bunch of workloads (some very code cache heavy), even without
> inlining to stress the dispatching mechanism more than reasonable, without
> observing any noticeable differences with my new mechanism compared to the
> old mechanism. So I am fairly optimistic at this point. What I have been
> fighting more with is start-up performance. I generate special unique
> numbers for each Method*, which stresses startup performance surprisingly
> much. But I am happy with where that is at right now after throwing a bunch
> of tricks on that code. Having said that, I am still in a prototyping
> phase, and have more evaluation to be done, and will of course have a close
> look at how that pans out.
>
Sounds great.  Look forward to hearing more as you make progress.

>
>
> Of course this is just speculation.
>
>
> ...I see what you did there.
>
:)

>
> /Erik
>
>
> Thanks
>
> On Fri, Oct 11, 2019 at 5:03 AM <erik.osterlund at oracle.com> wrote:
>
>> Hi,
>>
>> I prepared a new JEP [1], about rewriting the method invocation
>> mechanisms in HotSpot. In particular, the aim is to remove inline
>> caches, in an effort to make our code more robust and maintainable
>> (remove 3 nmethod states, ~10000 LOC concurrent code patching stuff,
>> make redefinition and unloading much more straight forward). Instead, it
>> is to be replaced with a table driven approach, leaving speculation
>> machinery to the hardware instead of software. More details are in the
>> JEP description.
>>
>> Feedback is more than welcome.
>>
>> Thanks,
>> /Erik
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8221828
>>
> --
> Sent from my phone
>
>
>