SymbolLookup.libraryLookup() is quite slow on MacOS with M-series chips

Chris Vest mr.chrisvest at gmail.com
Tue Jun 6 22:06:51 UTC 2023


Note that Netty (like many others that use JNI) only does the file copying
dance because it includes its native libraries inside its JAR files, and
the native libraries have to be copied out of the JAR files for dlopen to
read them.
If linking to any locally installed OpenSSL is good enough for your use
case, then you can just link it directly, without copying.

On Sat, May 27, 2023 at 10:35 AM 刘希晨 <benrush0705 at gmail.com> wrote:

> Today I did some more tests about the scenario, and what surprised me is
> that this problem has a wide influence.
>
> I am not an expert in C or C++, and I am not familiar with the underlying
> JVM source codes, I could only judge from the results and let you guys find
> out what's really wrong.
>
> This copy then loadlibrary mechanism has been used by a lot of famous
> open-source projects, such as Netty or RocksDB, in fact I learned from them
> when I was developing my own framework.
> So I tested a really really simple example to prove my idea.
>
> package com.example.demo;
>
> import io.netty.channel.kqueue.KQueue;
>
> public class NativeTest {
>     public static void main(String[] args) {
>         long l = System.currentTimeMillis();
>         KQueue.isAvailable();
>         System.out.println(System.currentTimeMillis() - l);
>     }
> }
>
>
> The Kqueue static block will load the underlying Kqueue dylib library into
> the process, and this single line cost 900ms on my M1-pro Macbook, while
> starting a tcp server cost about 1000ms, which is to say that most of the
> time was consumed by this mechanism.
>
> Just try the code above, I don't know why this is happening, and hope you
> experts could prove me wrong. cheers!
>
> Sebastian Stenzel <sebastian.stenzel at gmail.com> 于2023年5月27日周六 13:54写道:
>
>> Have you tried copying and dlopen'ing the .dylib from a C program?
>> If you think that copying it is the culprit, it may be some other OS
>> security feature, since it is the first time for the OS to open the binary
>> from a new location…?
>>
>> Am 27.05.2023 um 04:18 schrieb 刘希晨 <benrush0705 at gmail.com>:
>>
>> 
>> The temp folder is not an influence factor, it will happen in any folder
>> that calling System.load() or SymbolLookup.librarylookup() on a new
>> copied dynamic library, as long as the target dynamic library was just
>> created in filesystem by copy() or other mechanism. Calling for the first
>> time is even worse, might take a second to do the job, since then it
>> becomes 300ms per call. I also tested changing the file size of the .dylib,
>> which seems to have no effect on the time consumption.
>>
>> 刘希晨 <benrush0705 at gmail.com> 于2023年5月27日周六 08:16写道:
>>
>>> Sorry guys, fall asleep last night, cause it's late in my country.
>>>
>>> I have run some more tests, and now I am 100% percent sure that the
>>> problem is not about System.load() or SymbolLookup.librarylookup() and I am
>>> pretty sure it should be reproducible.
>>>
>>> here is the minimal version of the reproducible code:
>>>
>>> ```
>>>
>>> private static final String s = "/Users/liuxichen/workspace/demo/src/main/resources/lib/libcrypto.3.dylib";
>>> public static void main(String[] args) throws IOException {
>>>     long start = System.currentTimeMillis();
>>>     FileInputStream fileInputStream = new FileInputStream(s);
>>>     File f = File.createTempFile("test", ".dylib");
>>>     Path p = f.toPath();
>>>     Files.copy(fileInputStream, p, StandardCopyOption.REPLACE_EXISTING);
>>>     f.deleteOnExit();
>>>     System.out.println("File operation cost : " + (System.currentTimeMillis() - start));
>>>     SymbolLookup symbolLookup = SymbolLookup.libraryLookup(p, SegmentScope.global());
>>>     System.out.println("all cost : " + (System.currentTimeMillis() - start));
>>>     fileInputStream.close();
>>> }
>>>
>>> ```
>>>
>>> here is my test results :
>>> 1. Just running System.load() or SymbolLookup.librarylookup() will have
>>> no problem and that should cost just about 3ms, no matter it's in the
>>> normal folder or in the tmp folder.
>>> 2. If using mechanism in the above codes, which is directly call
>>> SymbolLookup.librarylookup() right after copying file to tmp folder, the
>>> operation is quite slow, it takes about 300ms each time on my M1-pro chips
>>> 3. The performance drop using above code is not that obvious on Windows
>>> or Linux, cost about 20ms
>>>
>>> I think my OpenSSL library is right in arm architecture, since I install
>>> it using homebrew:
>>> ❯ openssl version -a
>>> OpenSSL 3.1.0 14 Mar 2023 (Library: OpenSSL 3.1.0 14 Mar 2023)
>>> built on: Tue Mar 14 12:59:07 2023 UTC
>>> platform: darwin64-arm64-cc
>>> options:  bn(64,64)
>>> compiler: clang -fPIC -arch arm64 -O3 -Wall -DL_ENDIAN -DOPENSSL_PIC
>>> -D_REENTRANT -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
>>> OPENSSLDIR: "/opt/homebrew/etc/openssl at 3"
>>> ENGINESDIR: "/opt/homebrew/Cellar/openssl at 3/3.1.0/lib/engines-3"
>>> MODULESDIR: "/opt/homebrew/Cellar/openssl at 3/3.1.0/lib/ossl-modules"
>>> Seeding source: os-specific
>>> CPUINFO: OPENSSL_armcap=0x187d
>>>
>>>
>>>
>>> Sebastian Stenzel <sebastian.stenzel at gmail.com> 于2023年5月27日周六 02:22写道:
>>>
>>>> Is it reproducible? Maybe your openssl .dylib is x86_64 and Rosetta
>>>> kicked in on first use?
>>>>
>>>> > Am 26.05.2023 um 18:21 schrieb Maurizio Cimadamore <
>>>> maurizio.cimadamore at oracle.com>:
>>>> >
>>>> > 
>>>> >> On 26/05/2023 16:51, 刘希晨 wrote:
>>>> >> I noticed a weird problem about SymbolLookup.libraryLookup() with
>>>> SegmentScope.global when I was testing on my M1-pro macbook using JDK20,
>>>> the method took about 700ms to finish when loading OpenSSL's dynamic
>>>> library, this result is quite slow. I tested on Windows and Linux(using WSL
>>>> on Windows) with older hardware (AMD Ryzen 3700x) and it takes around
>>>> 100ms. Please run some tests about this situation and correct me if I am
>>>> wrong. Thanks for your help!
>>>> > Hi,
>>>> >
>>>> > this seems very odd, we will look into it.
>>>> >
>>>> > Note that SymbolLookup::libraryLookup just ends up straight into a
>>>> dlopen call. There are not many strings attached, really. 700ms (but even
>>>> 100ms) seems quite a lot to me.
>>>> >
>>>> > Can you also try with System::loadLibrary - is it faster that way (I
>>>> very much doubt it, since the underlying code is the same?)
>>>> >
>>>> > Cheers
>>>> > Maurizio
>>>> >
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230606/a2ba97bb/attachment-0001.htm>


More information about the panama-dev mailing list