SymbolLookup.libraryLookup() is quite slow on MacOS with M-series chips

刘希晨 benrush0705 at gmail.com
Sat May 27 17:34:44 UTC 2023


Today I did some more tests about the scenario, and what surprised me is
that this problem has a wide influence.

I am not an expert in C or C++, and I am not familiar with the underlying
JVM source codes, I could only judge from the results and let you guys find
out what's really wrong.

This copy then loadlibrary mechanism has been used by a lot of famous
open-source projects, such as Netty or RocksDB, in fact I learned from them
when I was developing my own framework.
So I tested a really really simple example to prove my idea.

package com.example.demo;

import io.netty.channel.kqueue.KQueue;

public class NativeTest {
    public static void main(String[] args) {
        long l = System.currentTimeMillis();
        KQueue.isAvailable();
        System.out.println(System.currentTimeMillis() - l);
    }
}


The Kqueue static block will load the underlying Kqueue dylib library into
the process, and this single line cost 900ms on my M1-pro Macbook, while
starting a tcp server cost about 1000ms, which is to say that most of the
time was consumed by this mechanism.

Just try the code above, I don't know why this is happening, and hope you
experts could prove me wrong. cheers!

Sebastian Stenzel <sebastian.stenzel at gmail.com> 于2023年5月27日周六 13:54写道:

> Have you tried copying and dlopen'ing the .dylib from a C program?
> If you think that copying it is the culprit, it may be some other OS
> security feature, since it is the first time for the OS to open the binary
> from a new location…?
>
> Am 27.05.2023 um 04:18 schrieb 刘希晨 <benrush0705 at gmail.com>:
>
> 
> The temp folder is not an influence factor, it will happen in any folder
> that calling System.load() or SymbolLookup.librarylookup() on a new
> copied dynamic library, as long as the target dynamic library was just
> created in filesystem by copy() or other mechanism. Calling for the first
> time is even worse, might take a second to do the job, since then it
> becomes 300ms per call. I also tested changing the file size of the .dylib,
> which seems to have no effect on the time consumption.
>
> 刘希晨 <benrush0705 at gmail.com> 于2023年5月27日周六 08:16写道:
>
>> Sorry guys, fall asleep last night, cause it's late in my country.
>>
>> I have run some more tests, and now I am 100% percent sure that the
>> problem is not about System.load() or SymbolLookup.librarylookup() and I am
>> pretty sure it should be reproducible.
>>
>> here is the minimal version of the reproducible code:
>>
>> ```
>>
>> private static final String s = "/Users/liuxichen/workspace/demo/src/main/resources/lib/libcrypto.3.dylib";
>> public static void main(String[] args) throws IOException {
>>     long start = System.currentTimeMillis();
>>     FileInputStream fileInputStream = new FileInputStream(s);
>>     File f = File.createTempFile("test", ".dylib");
>>     Path p = f.toPath();
>>     Files.copy(fileInputStream, p, StandardCopyOption.REPLACE_EXISTING);
>>     f.deleteOnExit();
>>     System.out.println("File operation cost : " + (System.currentTimeMillis() - start));
>>     SymbolLookup symbolLookup = SymbolLookup.libraryLookup(p, SegmentScope.global());
>>     System.out.println("all cost : " + (System.currentTimeMillis() - start));
>>     fileInputStream.close();
>> }
>>
>> ```
>>
>> here is my test results :
>> 1. Just running System.load() or SymbolLookup.librarylookup() will have
>> no problem and that should cost just about 3ms, no matter it's in the
>> normal folder or in the tmp folder.
>> 2. If using mechanism in the above codes, which is directly call
>> SymbolLookup.librarylookup() right after copying file to tmp folder, the
>> operation is quite slow, it takes about 300ms each time on my M1-pro chips
>> 3. The performance drop using above code is not that obvious on Windows
>> or Linux, cost about 20ms
>>
>> I think my OpenSSL library is right in arm architecture, since I install
>> it using homebrew:
>> ❯ openssl version -a
>> OpenSSL 3.1.0 14 Mar 2023 (Library: OpenSSL 3.1.0 14 Mar 2023)
>> built on: Tue Mar 14 12:59:07 2023 UTC
>> platform: darwin64-arm64-cc
>> options:  bn(64,64)
>> compiler: clang -fPIC -arch arm64 -O3 -Wall -DL_ENDIAN -DOPENSSL_PIC
>> -D_REENTRANT -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
>> OPENSSLDIR: "/opt/homebrew/etc/openssl at 3"
>> ENGINESDIR: "/opt/homebrew/Cellar/openssl at 3/3.1.0/lib/engines-3"
>> MODULESDIR: "/opt/homebrew/Cellar/openssl at 3/3.1.0/lib/ossl-modules"
>> Seeding source: os-specific
>> CPUINFO: OPENSSL_armcap=0x187d
>>
>>
>>
>> Sebastian Stenzel <sebastian.stenzel at gmail.com> 于2023年5月27日周六 02:22写道:
>>
>>> Is it reproducible? Maybe your openssl .dylib is x86_64 and Rosetta
>>> kicked in on first use?
>>>
>>> > Am 26.05.2023 um 18:21 schrieb Maurizio Cimadamore <
>>> maurizio.cimadamore at oracle.com>:
>>> >
>>> > 
>>> >> On 26/05/2023 16:51, 刘希晨 wrote:
>>> >> I noticed a weird problem about SymbolLookup.libraryLookup() with
>>> SegmentScope.global when I was testing on my M1-pro macbook using JDK20,
>>> the method took about 700ms to finish when loading OpenSSL's dynamic
>>> library, this result is quite slow. I tested on Windows and Linux(using WSL
>>> on Windows) with older hardware (AMD Ryzen 3700x) and it takes around
>>> 100ms. Please run some tests about this situation and correct me if I am
>>> wrong. Thanks for your help!
>>> > Hi,
>>> >
>>> > this seems very odd, we will look into it.
>>> >
>>> > Note that SymbolLookup::libraryLookup just ends up straight into a
>>> dlopen call. There are not many strings attached, really. 700ms (but even
>>> 100ms) seems quite a lot to me.
>>> >
>>> > Can you also try with System::loadLibrary - is it faster that way (I
>>> very much doubt it, since the underlying code is the same?)
>>> >
>>> > Cheers
>>> > Maurizio
>>> >
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230528/8ccba86b/attachment-0001.htm>


More information about the panama-dev mailing list