SymbolLookup.libraryLookup() is quite slow on MacOS with M-series chips

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Sat May 27 23:04:18 UTC 2023


Hi,
I'm not 100% this discussion is a good fit for this mailing list. The 
issue you are reporting seems to have nothing to do with the FFM API, or 
SymbolLookup.

Honestly, from the kind of numbers you get, I tend to agree with 
Sebastian when he said this:

> If you think that copying it is the culprit, it may be some other OS 
> security feature, since it is the first time for the OS to open the 
> binary from a new location…?

E.g there seems to be something kicking in when that temporary folder is 
touched, anti-virus, or something else (this is a shared lib you are 
moving around, so it wouldn't be surprising if the OS looked at that 
operation suspiciously).

For instance:

https://support.apple.com/en-ie/HT202491

Have you tried setting your OS to trust applications from any source? 
(temporarily of course). Does that make any difference?

I think it would be good to capture the situation you are facing in a 
bug report - but before we do that, I guess it would be great if you 
could setup a repo somewhere with some reproducer that other people can 
just run (e.g. something with maven or gradle?). So that we can have a 
better ideas of what might be the culprit and/or which OS might be affected.

Note that having a dependency on Netty might make things more confusing 
- when you access a static method on KQueue, you are going to run static 
initializers (I don't know off-hand how many), but it is not unusual for 
big frameworks to have a biggie startup time - since in Java classes are 
loaded and initialized on demand, you only see that cost when you call 
the KQueue method - but you are not just measuring how long does it take 
for that method to complete, you are also measuring the time the JVM 
takes to load _all_ the classes that might be required (and they might 
be more then you think) as well as running static initializers on some 
of them (which might pull in more classes, etc.).

Try to run with the JVM `-verbose` option, which prints which classes 
are loaded and when, so that you can get an idea of how much stuff 
actually happens when you call that KQueue method.

Cheers
Maurizio

On 27/05/2023 18:34, 刘希晨 wrote:
> Today I did some more tests about the scenario, and what surprised me 
> is that this problem has a wide influence.
>
> I am not an expert in C or C++, and I am not familiar with the 
> underlying JVM source codes, I could only judge from the results and 
> let you guys find out what's really wrong.
>
> This copy then loadlibrary mechanism has been used by a lot of famous 
> open-source projects, such as Netty or RocksDB, in fact I learned from 
> them when I was developing my own framework.
> So I tested a really really simple example to prove my idea.
>
> package com.example.demo;
>
> import io.netty.channel.kqueue.KQueue;
>
> public class NativeTest {
>      public static void main(String[] args) {
>          long l = System.currentTimeMillis();
>          KQueue.isAvailable();
>          System.out.println(System.currentTimeMillis() - l);
>      }
> }
>
> The Kqueue static block will load the underlying Kqueue dylib library 
> into the process, and this single line cost 900ms on my M1-pro 
> Macbook, while starting a tcp server cost about 1000ms, which is to 
> say that most of the time was consumed by this mechanism.
>
> Just try the code above, I don't know why this is happening, and hope 
> you experts could prove me wrong. cheers!
>
> Sebastian Stenzel <sebastian.stenzel at gmail.com> 于2023年5月27日周六 
> 13:54写道:
>
>     Have you tried copying and dlopen'ing the .dylib from a C program?
>     If you think that copying it is the culprit, it may be some other
>     OS security feature, since it is the first time for the OS to open
>     the binary from a new location…?
>
>>     Am 27.05.2023 um 04:18 schrieb 刘希晨 <benrush0705 at gmail.com>:
>>
>>     
>>     The temp folder is not an influence factor, it will happen in any
>>     folder that calling System.load() or SymbolLookup.librarylookup()
>>     on a new copied dynamic library, as long as the target dynamic
>>     library was just created in filesystem by copy() or other
>>     mechanism. Calling for the first time is even worse, might take a
>>     second to do the job, since then it becomes 300ms per call. I
>>     also tested changing the file size of the .dylib, which seems to
>>     have no effect on the time consumption.
>>
>>     刘希晨 <benrush0705 at gmail.com> 于2023年5月27日周六 08:16写道:
>>
>>         Sorry guys, fall asleep last night, cause it's late in my
>>         country.
>>
>>         I have run some more tests, and now I am 100% percent sure
>>         that the problem is not about System.load() or
>>         SymbolLookup.librarylookup() and I am pretty sure it should
>>         be reproducible.
>>
>>         here is the minimal version of the reproducible code:
>>
>>         ```
>>
>>         private static final Strings ="/Users/liuxichen/workspace/demo/src/main/resources/lib/libcrypto.3.dylib";
>>         public static void main(String[] args)throws IOException {
>>              long start = System.currentTimeMillis();
>>              FileInputStream fileInputStream =new FileInputStream(s);
>>              File f = File.createTempFile("test",".dylib");
>>              Path p = f.toPath();
>>              Files.copy(fileInputStream, p, StandardCopyOption.REPLACE_EXISTING);
>>              f.deleteOnExit();
>>              System.out.println("File operation cost : " + (System.currentTimeMillis() - start));
>>              SymbolLookup symbolLookup = SymbolLookup.libraryLookup(p, SegmentScope.global());
>>              System.out.println("all cost : " + (System.currentTimeMillis() - start));
>>              fileInputStream.close();
>>         }
>>
>>         ```
>>
>>         here is my test results :
>>         1. Just running System.load() or SymbolLookup.librarylookup()
>>         will have no problem and that should cost just about 3ms, no
>>         matter it's in the normal folder or in the tmp folder.
>>         2. If using mechanism in the above codes, which is directly
>>         call SymbolLookup.librarylookup() right after copying file to
>>         tmp folder, the operation is quite slow, it takes about 300ms
>>         each time on my M1-pro chips
>>         3. The performance drop using above code is not that obvious
>>         on Windows or Linux, cost about 20ms
>>
>>         I think my OpenSSL library is right in arm architecture,
>>         since I install it using homebrew:
>>         ❯ openssl version -a
>>         OpenSSL 3.1.0 14 Mar 2023 (Library: OpenSSL 3.1.0 14 Mar 2023)
>>         built on: Tue Mar 14 12:59:07 2023 UTC
>>         platform: darwin64-arm64-cc
>>         options:  bn(64,64)
>>         compiler: clang -fPIC -arch arm64 -O3 -Wall -DL_ENDIAN
>>         -DOPENSSL_PIC -D_REENTRANT -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
>>         OPENSSLDIR: "/opt/homebrew/etc/openssl at 3"
>>         ENGINESDIR: "/opt/homebrew/Cellar/openssl at 3/3.1.0/lib/engines-3"
>>         MODULESDIR:
>>         "/opt/homebrew/Cellar/openssl at 3/3.1.0/lib/ossl-modules"
>>         Seeding source: os-specific
>>         CPUINFO: OPENSSL_armcap=0x187d
>>
>>
>>
>>         Sebastian Stenzel <sebastian.stenzel at gmail.com>
>>         于2023年5月27日周六 02:22写道:
>>
>>             Is it reproducible? Maybe your openssl .dylib is x86_64
>>             and Rosetta kicked in on first use?
>>
>>             > Am 26.05.2023 um 18:21 schrieb Maurizio Cimadamore
>>             <maurizio.cimadamore at oracle.com>:
>>             >
>>             > 
>>             >> On 26/05/2023 16:51, 刘希晨 wrote:
>>             >> I noticed a weird problem about
>>             SymbolLookup.libraryLookup() with SegmentScope.global
>>             when I was testing on my M1-pro macbook using JDK20, the
>>             method took about 700ms to finish when loading OpenSSL's
>>             dynamic library, this result is quite slow. I tested on
>>             Windows and Linux(using WSL on Windows) with older
>>             hardware (AMD Ryzen 3700x) and it takes around 100ms.
>>             Please run some tests about this situation and correct me
>>             if I am wrong. Thanks for your help!
>>             > Hi,
>>             >
>>             > this seems very odd, we will look into it.
>>             >
>>             > Note that SymbolLookup::libraryLookup just ends up
>>             straight into a dlopen call. There are not many strings
>>             attached, really. 700ms (but even 100ms) seems quite a
>>             lot to me.
>>             >
>>             > Can you also try with System::loadLibrary - is it
>>             faster that way (I very much doubt it, since the
>>             underlying code is the same?)
>>             >
>>             > Cheers
>>             > Maurizio
>>             >
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20230528/15332cfe/attachment-0001.htm>


More information about the panama-dev mailing list