RR(S): 8009062 poor performance of JNI AttachCurrentThread after fix for 7017193

Tue May 7 07:27:03 PDT 2013

On 05/06/2013 04:44 PM, Dmitry Samersoff wrote:
> Adam,
>
> Thank you very much for your efforts and patience.
>
> I did intensive testing so please, read below.

Sorry if I implied otherwise. Thanks for the deep look.

> On 2013-04-24 21:07, Adam Domurad wrote:
>> On 04/22/2013 05:41 PM, Dmitry Samersoff wrote:
>>> Hi Everybody,
>> Thanks for tackling this.
>>
>>> Here is webrev of proposed changes:
>>>
>>> http://cr.openjdk.java.net/~dsamersoff/8009062/webrev.04/
>>>
>>> Any comments is much appreciated.
>>>
>>> The problem:
>>>
>>> Under Linux stack of main thread is growable, so we have to make sure
>>> that address we plan to put a guard pages to and below is not mapped.
>>>
>>> Historically we find bounds of the stack of main thread by seeking
>>> /proc/self/maps for "[stack]" and parsing this line.
>>>
>>> I don't like buffered reading of /proc files and sometimes ago rewrite
>>> this function to read it byte-to-byte. Unfortunately, resulting
>>> performance penalties is not acceptable.
>> I'm afraid I don't follow. What did you not like about the buffered
>> reading ?
> If one thread changes a process mapping and another thread reads
> /proc/self/maps at the same time we end up reading incorrect value.
>
> (I was able to reproduce it with artificial testcase with stdio based
> code and probably your patch is affected as well).
>
> I'm not sure whether it possible in reality or not - artificial testcase
> changes protection of top page of the stack - and I understand that it's
> not a frequent condition.
>
> But if something like this happens on customer side they comes back with
> irregular crash that is very hard to debug. So it's better safe than sorry.

Sounds reasonable to me.

>
>>   From my performance measurements this patch is slower (not
>> significantly, but around 30%).
>> What are your performance measurements like between get_stack_bounds_ex
>> & get_stack_bounds (from this patch) ? Is there any worst-case behaviour
>> that you fear from the original patch ?
> My performance measurement shows about 20% slowdown to stdio based code
> and about 4% slowdown to your patch.
>
> Time taken MC 3998        tot: 83.8300 s avg: 20967.9840 clk
> Time taken EX(stdio) 3998 tot: 66.9900 s avg: 16755.8779 clk
>
> Time taken MC 3158        tot: 65.9700 s avg: 20889.8037 clk
> Time taken EX(Adam) 3158  tot: 63.9000 s avg: 20234.3255 clk
>
> So according to all above I would prefer to go ahead with mincore
> based solution.

Thanks for your numbers, and yes I think 20% slower is acceptable. The 
pause in LibreOffice was not noticable.
Looks good from my end.

> -Dmitry

Thanks,
-Adam