RFR (S) 8235962: os::current_thread_id() is not signal safe on macOS

Wed Jan 22 17:42:46 UTC 2020

Thank you David for taking the time to follow the explanation and the 
review!

I still need a second review please.

On 1/16/20 4:07 PM, David Holmes wrote:
> Hi Gerard,
>
> Thank you for the very detailed discussion on mach ports! Please add 
> all of that to the bug report (plus any references/links to where you 
> found the info).
>
> I'm good with the current fix to "do the right thing" and keep the 
> reference count in sync.
>
> Thanks,
> David
>
> On 17/01/2020 5:14 am, gerard ziemski wrote:
>> Hi David,
>>
>> Thank you very much for the review.
>>
>>
>> On 1/13/20 8:49 PM, David Holmes wrote:
>>> Hi Gerard,
>>>
>>> Thanks for tackling this. All extremely confusing as usual :) far 
>>> too many different notions of "thread identity".
>>>
>>> On 14/01/2020 4:43 am, gerard ziemski wrote:
>>>> hi all,
>>>>
>>>> Before JDK-8022808 we used to call "mach_thread_self()" to obtain a 
>>>> thread id, but back then we forgot to release the returned mach 
>>>> port, which would make a long running java process (like 
>>>> Kitchensink) eventually run out of resources and hang it.
>>>>
>>>> To solve that, in JDK-8022808, we switched to using 
>>>> "pthread_mach_thread_np(pthread_self())", which returns a static 
>>>> mach port, which does not need to be released.
>>>> However, the "pthread_mach_thread_np()" API, which we started using 
>>>> is specifically knownn as not async-signal-safe API and that might 
>>>> cause an issue, because we use it to generate thread ids in 
>>>> hs_err_pid.log crash log files in our signal handler, which 
>>>> requires that only async-signal-safe API be used.
>>>>
>>>> Our crash handler is best effort approach, and to this extend I'd 
>>>> like to propose a fix, which reverts back to using 
>>>> "mach_thread_self()", which I must admit is unknown whether it 
>>>> itself is async-signal-safe, though it's not part of pthread API, 
>>>> like "pthread_mach_thread_np()" which itself is not part of 
>>>> async-signal-safe APIs (IEEE Std 1003.1, 2016). Additionally, the 
>>>> reporter of the issue suggest that it solves the actual issue with 
>>>> using "pthread_mach_thread_np()" for them in their project, so 
>>>> let's benefit from their experience.
>>>
>>> Okay that seems reasonable.
>>>
>>>> In this fix we revert back to using "mach_thread_self()" API, but 
>>>> make sure to release the mach port, and use MACH_PORT_VALID() macro 
>>>> to verify the validity of mach port.
>>>
>>> I'm unclear on this aspect. Now that we are releasing it doesn't 
>>> that mean that the port value could be reused and applied to a 
>>> different thread? It isn't at all clear to me what a "mach port" 
>>> actually means and what releasing it (or failing to) means.
>>
>> I've done more research and testing and here is how I understand our 
>> usage of mach ports:
>>
>> Mach ports are a resource a code needs in order to talk to mach 
>> thread (or task), though we simply use its value as thread id. They 
>> (i.e. the mach kernel) use reference counting to keep track of 
>> whether they need to exist, and get automatically destroyed when the 
>> ref count reaches 0.
>>
>> They can be created via APIs like mach_port_allocate(), and deleted 
>> directly via mach_port_destroy(), or indirectly via 
>> mach_port_deallocate(). The difference between these two (inaptly 
>> named imho) APIs is that while mach_port_destroy() will outright 
>> delete the port, mach_port_deallocate() will only decrease the ref 
>> count and will call mach_port_destroy() implicitly only if it reaches 0.
>>
>> If the user obtains mach port via mach_thread_self(), the ref count 
>> of that thread will increase, which needs to be matched by a call to 
>> mach_port_deallocate(). However, one can not delete such special mach 
>> port via mach_port_destroy() (only user own created ports can de 
>> deleted).
>>
>> Back in 2013 a call to mach_thread_self() would create a brand new 
>> mach port, so if you did not call mach_port_deallocate() after being 
>> done with it, the resource would be leaked. I believe that was 
>> actually a bug in early OS X, or possibly a bad API design, which 
>> affected many other users, not just us (i.e. Google's Chrome app for 
>> Mac). The confusion was primary due to the fact that mach_task_self() 
>> would not create a new port, so developers expected the same from 
>> mach_thread_self(). There are only about 300,000 ports available to a 
>> process (verified via a native sample that keeps creating new ports 
>> via mach_port_allocate() API), so eventually long lived apps, like 
>> our kitchensink, would run out of them.
>>
>> We could have fixed our issue by matching mach_thread_self() with 
>> mach_port_deallocate() even back then, but instead we chose to use 
>> pthread_mach_thread_np() API, which is documented to return thread's 
>> static mach port, which does not need to be returned via 
>> mach_port_deallocate(). I think that solution just bypassed the 
>> mystery of mach ports and when and if we need to delete them, but 
>> also was more efficient as it did not create a new port.
>>
>> Today, mach_thread_self() on OS X behaves differently, to how it did 
>> back then - it no longer creates new mach ports, and so in most cases 
>> the matching call to mach_port_deallocate() could be skipped, as the 
>> counter the mach kernel uses for ref count is a very big number. 
>> Though it would eventually overflow, and the kernel then would 
>> probably delete the mach port with tragic results. I tried to see if 
>> I could make that happen, but no luck in 24 hours of a test case 
>> continuously asking for mach_thread_self() every 10 micro seconds.
>>
>> So today we could have simply reverted to how the code used to be, 
>> and not even bother to call mach_port_deallocate() after 
>> mach_thread_self(), though theoretically that could overflow the ref 
>> count eventually.
>>
>> The proposed fix, reverts back to using mach_thread_self() (which 
>> increases the ref count) because it's probably signal-async-safe, 
>> unlike pthread_mach_thread_np(), AND calls mach_port_deallocate() 
>> (which decreases the refcount)
>>
>> Someone please correct me if I got anything wrong here.
>>
>>
>>>
>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8235962
>>>> webrev: http://cr.openjdk.java.net/~gziemski/8235962_rev1/index.html
>>>
>>> I found the refactoring around _unique_thread_id a bit hard to 
>>> follow. Your new set_unique_thread_id needs to use "#ifdef 
>>> __APPLE__" otherwise it won't compile on other BSD systems.
>>
>> I will fix it, test it and post a new webrev, assuming everyone is 
>> satisfied with my answers above.

I chose the pattern of the original code and decided not to #ifdef the 
"_unique_thread_id" field (or the API), so that I don't have to touch 
the file "vmStructs_bsd_x86.hpp" where it's exported, and only #ifdef 
the implementation of "set_unique_thread_id()" and the client calls to 
it, similar to the original code:

bug: https://bugs.openjdk.java.net/browse/JDK-8235962
webrev: http://cr.openjdk.java.net/~gziemski/8235962_rev2/index.htm
Tested with Mach5 tier1,2,3,4,5 (and verified by watching the number of 
process ports while executing Kitchensink)

cheers