RR(S): 8009062 poor performance of JNI AttachCurrentThread after fix for 7017193
Dmitry Samersoff
dmitry.samersoff at oracle.com
Mon May 6 13:44:05 PDT 2013
Adam,
Thank you very much for your efforts and patience.
I did intensive testing so please, read below.
On 2013-04-24 21:07, Adam Domurad wrote:
> On 04/22/2013 05:41 PM, Dmitry Samersoff wrote:
>> Hi Everybody,
>
> Thanks for tackling this.
>
>>
>> Here is webrev of proposed changes:
>>
>> http://cr.openjdk.java.net/~dsamersoff/8009062/webrev.04/
>>
>> Any comments is much appreciated.
>>
>> The problem:
>>
>> Under Linux stack of main thread is growable, so we have to make sure
>> that address we plan to put a guard pages to and below is not mapped.
>>
>> Historically we find bounds of the stack of main thread by seeking
>> /proc/self/maps for "[stack]" and parsing this line.
>>
>> I don't like buffered reading of /proc files and sometimes ago rewrite
>> this function to read it byte-to-byte. Unfortunately, resulting
>> performance penalties is not acceptable.
>
> I'm afraid I don't follow. What did you not like about the buffered
> reading ?
If one thread changes a process mapping and another thread reads
/proc/self/maps at the same time we end up reading incorrect value.
(I was able to reproduce it with artificial testcase with stdio based
code and probably your patch is affected as well).
I'm not sure whether it possible in reality or not - artificial testcase
changes protection of top page of the stack - and I understand that it's
not a frequent condition.
But if something like this happens on customer side they comes back with
irregular crash that is very hard to debug. So it's better safe than sorry.
> From my performance measurements this patch is slower (not
> significantly, but around 30%).
> What are your performance measurements like between get_stack_bounds_ex
> & get_stack_bounds (from this patch) ? Is there any worst-case behaviour
> that you fear from the original patch ?
My performance measurement shows about 20% slowdown to stdio based code
and about 4% slowdown to your patch.
Time taken MC 3998 tot: 83.8300 s avg: 20967.9840 clk
Time taken EX(stdio) 3998 tot: 66.9900 s avg: 16755.8779 clk
Time taken MC 3158 tot: 65.9700 s avg: 20889.8037 clk
Time taken EX(Adam) 3158 tot: 63.9000 s avg: 20234.3255 clk
So according to all above I would prefer to go ahead with mincore
based solution.
-Dmitry
>> Solution:
>>
>> Below is slightly different approach - I use mincore(2) to check whether
>> the page is mapped or not. Typically mincore(2) is used to check whether
>> the page is resides in physical memory or not, but this function returns
>> -1 and set errno to ENOMEM if a region we are asking about contains not
>> mapped memory.
>>
>> Testing:
>>
>> Passed jprt and couple of jtreg tests. No special regression test
>> necessary as the test for 6929067 covers this case as well.
>>
>> -Dmitry
>>
>>
>
> Overall approach looks sounds,;I have not done thorough testing but I
> did check the assert and everything is fine here.
> Libreoffice performance with the E-Porto plugin (against which a bug was
> filed) looks to be acceptable now (there is little doubt of the change -
> the pause time was *very* noticeable before).
>
> Thanks again,
> -Adam
--
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* Give Rabbit time, and he'll always get the answer
More information about the hotspot-dev
mailing list