Real-Life Benchmark for FUSE's readdir()
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Thu Jul 15 11:08:57 UTC 2021
Aha - it seems like you are seeing what I was seeing: unrolling now
seems to happen more reliably, which positively affect code like strlen.
As for FUSE, I think the reason for the difference has probably nothing
to do with string conversion - the sampler profiler just happens to hit
that code a lot. I checked JNR code for string conversion and I couldn't
really find anything uber optimized in that regard that could explain
the gap.
Probably something is not getting optimized as it should - likely a
downcall/upcall intrinsification is failing - maybe due to a subtle
issue with your code, or, possibly because you are hitting a
non-implemented case (e.g. we do not intrinsify calls which pass
arguments on the stack, yet), or because of some other bug.
Maurizio
On 15/07/2021 12:03, Sebastian Stenzel wrote:
> Wow, I stand corrected. I just re-ran the benchmark and
> `benchmarkStrlenBase` just got a lot faster!! Your change in
> https://github.com/openjdk/jdk17/commit/2db9005c07585b580b3ec0889b8b5e3ed0d0ca6a
> <https://urldefense.com/v3/__https://github.com/openjdk/jdk17/commit/2db9005c07585b580b3ec0889b8b5e3ed0d0ca6a__;!!ACWV5N9M2RV99hQ!cbpDiQUPNMGhIDsNGhfDVdyQNOlKB_FkwucN7oxjq20KjZSuJM6qlIQHRXKfpy43X4WqhUU$> DID
> have an affect after all.
>
> Just doesn't impress FUSE very much...
>
>> On 15. Jul 2021, at 13:00, Sebastian Stenzel
>> <sebastian.stenzel at gmail.com <mailto:sebastian.stenzel at gmail.com>> wrote:
>>
>> Yup, I tried the int-approach as well, but with worse results... Here
>> is the full test:
>> https://gist.github.com/overheadhunter/86e7baae7dfe47c49ff364590a4f3ea6
>> <https://urldefense.com/v3/__https://gist.github.com/overheadhunter/86e7baae7dfe47c49ff364590a4f3ea6__;!!ACWV5N9M2RV99hQ!cbpDiQUPNMGhIDsNGhfDVdyQNOlKB_FkwucN7oxjq20KjZSuJM6qlIQHRXKfpy43gzWigdM$>
>>
>>> On 15. Jul 2021, at 12:51, Maurizio Cimadamore
>>> <maurizio.cimadamore at oracle.com
>>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>
>>> Ok. Thanks.
>>>
>>> I tried similar experiments where instead of reading 4 bytes
>>> separately I'd read a single int value, and then use shifts and
>>> bitmasking to check for terminators. On paper good, but benchmark
>>> results were always worse than the version we have now (at least on
>>> Linux).
>>>
>>> That said, if you could please share the full string benchmark you
>>> have, that'd be helpful, so we can take a look at that, and see
>>> what's going wrong (ideally, C2 should be the one doing unrolling).
>>>
>>> Maurizio
>>>
>>> On 15/07/2021 11:28, Sebastian Stenzel wrote:
>>>> I just did a quick snythetic test on a "manually unrolled" strlen()
>>>> without any FUSE context.
>>>>
>>>> I experimented with an implementation that looked like the
>>>> following and benchmarked it using a 259 byte memory segment
>>>> containing a 239 byte string (null byte at index 240):
>>>>
>>>> ```
>>>> private static int strlenUnroll4(MemorySegment segment, long start) {
>>>> int offset;
>>>> for (offset = 0; offset < segment.byteSize()-3; offset+=4) {
>>>> byte b0 = MemoryAccess.getByteAtOffset(segment, start + offset + 0);
>>>> byte b1 = MemoryAccess.getByteAtOffset(segment, start + offset + 1);
>>>> byte b2 = MemoryAccess.getByteAtOffset(segment, start + offset + 2);
>>>> byte b3 = MemoryAccess.getByteAtOffset(segment, start + offset + 3);
>>>> if (b0 == 0 || b1 == 0 || b2 == 0 || b3 == 0) { // is this even
>>>> faster than directly having 4 different branches?
>>>> if (b0 == 0) {
>>>> return offset;
>>>> } else if (b1 == 0) {
>>>> return offset + 1;
>>>> } else if (b2 == 0) {
>>>> return offset + 2;
>>>> } else if (b3 == 0) {
>>>> return offset + 3;
>>>> }
>>>> }
>>>> }
>>>> while (offset < segment.byteSize()) { // TODO: maybe no loop
>>>> required for the remaining <4 bytes?
>>>> byte b = MemoryAccess.getByteAtOffset(segment, start + offset);
>>>> if (b == 0) {
>>>> return offset;
>>>> }
>>>> }
>>>> throw new IllegalArgumentException("String too large");
>>>> }
>>>> ```
>>>>
>>>> I'm not even sure how reliable my results are, since I have no clue
>>>> about how branch prediction works here... Neither have I tested the
>>>> correctness of this implementation.
>>>>
>>>>
>>>>> On 15. Jul 2021, at 12:18, Maurizio Cimadamore
>>>>> <maurizio.cimadamore at oracle.com
>>>>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>>>
>>>>> Thanks for reporting back.
>>>>>
>>>>> We probably need to investigate this a bit more deeply and try and
>>>>> reproduce on our side.
>>>>>
>>>>> One last question: you said that with manual unrolling you managed
>>>>> to get 2x faster: did you mean that string conversion got 2x
>>>>> faster or that you actually saw your FUSE benchmark going 2x
>>>>> faster because of the manual unrolling with strings?
>>>>>
>>>>> Maurizio
>>>>>
>>>>> On 15/07/2021 11:03, Sebastian Stenzel wrote:
>>>>>> That, surprisingly, didn't change anything either. But don't
>>>>>> worry too much, the performance isn't bad (in absolute figures)
>>>>>> and it is by far not the only reason why I consider panama the
>>>>>> best solution to create java bindings for c libs.
>>>>>>
>>>>>>> On 12. Jul 2021, at 15:33, Maurizio Cimadamore
>>>>>>> <maurizio.cimadamore at oracle.com
>>>>>>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>>>>>
>>>>>>> Actually, after some bisecting, I found out that the performance
>>>>>>> of converting a memory segment into a string jumped 2x faster
>>>>>>> with this fix:
>>>>>>>
>>>>>>> https://urldefense.com/v3/__https://github.com/openjdk/jdk17/commit/2db9005c07585b580b3ec0889b8b5e3ed0d0ca6a__;!!ACWV5N9M2RV99hQ!bHack1nuOS1oQ5ndwvkBCiYZRnGA23YofE25pg5pKl680ixYi8-4gV4PuZiOieStbbmXQfs$
>>>>>>> <https://urldefense.com/v3/__https://github.com/openjdk/jdk17/commit/2db9005c07585b580b3ec0889b8b5e3ed0d0ca6a__;!!ACWV5N9M2RV99hQ!bHack1nuOS1oQ5ndwvkBCiYZRnGA23YofE25pg5pKl680ixYi8-4gV4PuZiOieStbbmXQfs$>
>>>>>>>
>>>>>>> Which was integrated after the one I originally pointed at. They
>>>>>>> both seem to touch loop optimization in case of overflows, which
>>>>>>> the strlen code is triggering (since the loop limit checks for
>>>>>>> loop variable being positive).
>>>>>>>
>>>>>>> This is a simple patch which adds a string conversion test:
>>>>>>>
>>>>>>> ```
>>>>>>> diff --git
>>>>>>> a/test/micro/org/openjdk/bench/jdk/incubator/foreign/StrLenTest.java
>>>>>>> b/test/micro/org/openjdk/bench/jdk/incubator/foreign/StrLenTest.java
>>>>>>> index ec4da5ffc88..5b3fb1a2b2a 100644
>>>>>>> ---
>>>>>>> a/test/micro/org/openjdk/bench/jdk/incubator/foreign/StrLenTest.java
>>>>>>> +++
>>>>>>> b/test/micro/org/openjdk/bench/jdk/incubator/foreign/StrLenTest.java
>>>>>>> @@ -93,10 +93,13 @@ public class StrLenTest {
>>>>>>> FunctionDescriptor.ofVoid(C_POINTER).withAttribute(FunctionDescriptor.TRIVIAL_ATTRIBUTE_NAME,
>>>>>>> true));
>>>>>>> }
>>>>>>>
>>>>>>> + MemorySegment segment;
>>>>>>> +
>>>>>>> @Setup
>>>>>>> public void setup() {
>>>>>>> str = makeString(size);
>>>>>>> segmentAllocator =
>>>>>>> SegmentAllocator.ofSegment(MemorySegment.allocateNative(size +
>>>>>>> 1, ResourceScope.newImplicitScope()));
>>>>>>> + segment = toCString(str, segmentAllocator);
>>>>>>> }
>>>>>>>
>>>>>>> @TearDown
>>>>>>> @@ -104,6 +107,11 @@ public class StrLenTest {
>>>>>>> scope.close();
>>>>>>> }
>>>>>>>
>>>>>>> + @Benchmark
>>>>>>> + public String panama_str_conv() throws Throwable {
>>>>>>> + return CLinker.toJavaString(segment);
>>>>>>> + }
>>>>>>> +
>>>>>>> @Benchmark
>>>>>>> public int jni_strlen() throws Throwable {
>>>>>>> return strlen(str);
>>>>>>> ```
>>>>>>>
>>>>>>> Before the above fix, the numbers are as follows:
>>>>>>>
>>>>>>> ```
>>>>>>> Benchmark (size) Mode Cnt Score Error Units
>>>>>>> StrLenTest.panama_str_conv 100 avgt 30 106.613 ? 7.060 ns/op
>>>>>>> ```
>>>>>>>
>>>>>>> While after the fix I get this:
>>>>>>>
>>>>>>> ```
>>>>>>> Benchmark (size) Mode Cnt Score Error Units
>>>>>>> StrLenTest.panama_str_conv 100 avgt 30 48.120 ? 0.557 ns/op
>>>>>>> ```
>>>>>>>
>>>>>>> So, as you can see, a pretty sizeable jump. Eyeballing, the
>>>>>>> shape of generated code doesn't look too different, which makes
>>>>>>> me think of another case where loop is unrolled, but main loop
>>>>>>> never executed (similar to JDK-8269230), but we'll need to look
>>>>>>> deeper.
>>>>>>>
>>>>>>> Maurizio
>>>>>>>
>>>>>>> On 12/07/2021 14:12, Maurizio Cimadamore wrote:
>>>>>>>> On 12/07/2021 13:18, Sebastian Stenzel wrote:
>>>>>>>>> Hey Maurizio,
>>>>>>>>>
>>>>>>>>> All tests have been done on commit 42e03fd7c6a (for details
>>>>>>>>> how I built the JDK, see my initial email). Maybe I'm missing
>>>>>>>>> some compiler flags to enable all optimizations?
>>>>>>>> I see - you do have the latest panama changes, but there has
>>>>>>>> been a sync with upstream after that changeset, I believe - can
>>>>>>>> you please try to resync with the latest foreign-jextract
>>>>>>>> commit - which should be:
>>>>>>>>
>>>>>>>> https://urldefense.com/v3/__https://github.com/openjdk/panama-foreign/commit/b2a284f6678c6a0d78fbdc8655695119ccb0dadb__;!!ACWV5N9M2RV99hQ!bHack1nuOS1oQ5ndwvkBCiYZRnGA23YofE25pg5pKl680ixYi8-4gV4PuZiOieStPAoLo1k$
>>>>>>>> <https://urldefense.com/v3/__https://github.com/openjdk/panama-foreign/commit/b2a284f6678c6a0d78fbdc8655695119ccb0dadb__;!!ACWV5N9M2RV99hQ!bHack1nuOS1oQ5ndwvkBCiYZRnGA23YofE25pg5pKl680ixYi8-4gV4PuZiOieStPAoLo1k$>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Are you sure about loop vectorization being applied to strlen?
>>>>>>>>> I'm not an expert on this field, but I had the impression this
>>>>>>>>> wasn't possible when the loop terminates "from within".
>>>>>>>> Vlad is the expert here - when chatting offline he did mention
>>>>>>>> that loop should have single exit - which I guess also takes
>>>>>>>> into account the "normal" exit - so the strlen routine would
>>>>>>>> seem to have two exits...
>>>>>>>>
>>>>>>>> Maurizio
>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Sebastian
>>>>>>>>>
>>>>>>>>>> On 12. Jul 2021, at 13:50, Maurizio Cimadamore
>>>>>>>>>> <maurizio.cimadamore at oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Sebastian,
>>>>>>>>>> thanks for sharing your findings - I've done some attempts
>>>>>>>>>> here with a targeted microbenchmark which measures the
>>>>>>>>>> performance of string conversion and I'm seeing unrolling and
>>>>>>>>>> vectorization being applied on the strlen computation.
>>>>>>>>>>
>>>>>>>>>> May I ask if, by any chance, your HEAD has not been updated
>>>>>>>>>> in the last few weeks? There has been a C2 optimization fix
>>>>>>>>>> which has been added recently, which I think might be related
>>>>>>>>>> to this:
>>>>>>>>>>
>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8269230
>>>>>>>>>>
>>>>>>>>>> Do you have this fix in the JDK you are using?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Maurizio
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10/07/2021 15:58, Sebastian Stenzel wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> good idea, but it makes no difference beyond statistical error.
>>>>>>>>>>>
>>>>>>>>>>> I started sampling the application with VisualVM (which is
>>>>>>>>>>> quite hard, since native threads are extremely short-lived.
>>>>>>>>>>> What I noticed is, that regardless of where the sampler
>>>>>>>>>>> interrupts a thread, in nearly all cases 100% of CPU time
>>>>>>>>>>> are caused by
>>>>>>>>>>> jdk.internal.foreign.abi.SharedUtils.toJavaStringInternal()
>>>>>>>>>>> → jdk.internal.foreign.abi.SharedUtils.strlen().
>>>>>>>>>>>
>>>>>>>>>>> I know that strlen can hardly be optimized due to the nature
>>>>>>>>>>> of null termination, but maybe we can make use of the fact
>>>>>>>>>>> that we're dealing with MemorySegments here: Since they
>>>>>>>>>>> protect us from overflows, maybe there is no need to look at
>>>>>>>>>>> only a single byte at a time. Maybe the strlen()-loop can be
>>>>>>>>>>> unrolled or even be vectorized.
>>>>>>>>>>>
>>>>>>>>>>> I just did a quick test and observed a x2 speedup when doing
>>>>>>>>>>> a x4 loop unroll.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Sebastian
>>>>>>>>>>>
>>>>>>>>>>>> On 9. Jul 2021, at 20:30, Jorn Vernee
>>>>>>>>>>>> <jorn.vernee at oracle.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for testing this. Looking at your code, one possible
>>>>>>>>>>>> explanation for the discrepancy I can think of is that the
>>>>>>>>>>>> DirFiller ends up using virtual downcalls to do it's work,
>>>>>>>>>>>> which are currently not intrinsified. Being mostly a case
>>>>>>>>>>>> of 'not implemented yet', i.e. it is a known issue.
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>> static fuse_fill_dir_t ofAddress(MemoryAddress addr) {
>>>>>>>>>>>> return (jdk.incubator.foreign.MemoryAddress x0,
>>>>>>>>>>>> jdk.incubator.foreign.MemoryAddress x1,
>>>>>>>>>>>> jdk.incubator.foreign.MemoryAddress x2, long x3) -> {
>>>>>>>>>>>> try {
>>>>>>>>>>>> return
>>>>>>>>>>>> (int)constants$0.fuse_fill_dir_t$MH.invokeExact((Addressable)addr,
>>>>>>>>>>>> x0, x1, x2, x3); // <--------- 'addr' here is not a
>>>>>>>>>>>> constant, so the call is virtual
>>>>>>>>>>>> } catch (Throwable ex$) {
>>>>>>>>>>>> throw new AssertionError("should not reach
>>>>>>>>>>>> here", ex$);
>>>>>>>>>>>> }
>>>>>>>>>>>> };
>>>>>>>>>>>> }
>>>>>>>>>>>> ```
>>>>>>>>>>>>
>>>>>>>>>>>> For testing purposes, a possible workaround could be to
>>>>>>>>>>>> have a cache that maps the callback address to a method
>>>>>>>>>>>> handle that has the address bound to the first parameter.
>>>>>>>>>>>> Assuming readdir always gets the same filler callback
>>>>>>>>>>>> address, the same MethodHandle will be reused and
>>>>>>>>>>>> eventually customized which means the callback address will
>>>>>>>>>>>> become constant, and the downcall should then be intrinsified.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't currently have access to a Mac machine to test
>>>>>>>>>>>> this, but if you want to try it out, the patch should be this:
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>> diff --git
>>>>>>>>>>>> a/src/main/java/de/skymatic/fusepanama/lowlevel/fuse_fill_dir_t.java
>>>>>>>>>>>> b/src/main/java/de/skymatic/fusepanama/lowlevel/fuse_fill_dir_t.java
>>>>>>>>>>>> index bfd4655..4c68d4c 100644
>>>>>>>>>>>> ---
>>>>>>>>>>>> a/src/main/java/de/skymatic/fusepanama/lowlevel/fuse_fill_dir_t.java
>>>>>>>>>>>> +++
>>>>>>>>>>>> b/src/main/java/de/skymatic/fusepanama/lowlevel/fuse_fill_dir_t.java
>>>>>>>>>>>> @@ -3,8 +3,12 @@
>>>>>>>>>>>> package de.skymatic.fusepanama.lowlevel;
>>>>>>>>>>>>
>>>>>>>>>>>> import java.lang.invoke.MethodHandle;
>>>>>>>>>>>> +import java.lang.invoke.MethodHandles;
>>>>>>>>>>>> import java.lang.invoke.VarHandle;
>>>>>>>>>>>> import java.nio.ByteOrder;
>>>>>>>>>>>> +import java.util.Map;
>>>>>>>>>>>> +import java.util.concurrent.ConcurrentHashMap;
>>>>>>>>>>>> +
>>>>>>>>>>>> import jdk.incubator.foreign.*;
>>>>>>>>>>>> import static jdk.incubator.foreign.CLinker.*;
>>>>>>>>>>>> public interface fuse_fill_dir_t {
>>>>>>>>>>>> @@ -17,13 +21,19 @@ public interface fuse_fill_dir_t {
>>>>>>>>>>>> return
>>>>>>>>>>>> RuntimeHelper.upcallStub(fuse_fill_dir_t.class, fi,
>>>>>>>>>>>> constants$0.fuse_fill_dir_t$FUNC,
>>>>>>>>>>>> "(Ljdk/incubator/foreign/MemoryAddress;Ljdk/incubator/foreign/MemoryAddress;Ljdk/incubator/foreign/MemoryAddress;J)I",
>>>>>>>>>>>> scope);
>>>>>>>>>>>> }
>>>>>>>>>>>> static fuse_fill_dir_t ofAddress(MemoryAddress addr) {
>>>>>>>>>>>> - return (jdk.incubator.foreign.MemoryAddress x0,
>>>>>>>>>>>> jdk.incubator.foreign.MemoryAddress x1,
>>>>>>>>>>>> jdk.incubator.foreign.MemoryAddress x2, long x3) -> {
>>>>>>>>>>>> - try {
>>>>>>>>>>>> - return
>>>>>>>>>>>> (int)constants$0.fuse_fill_dir_t$MH.invokeExact((Addressable)addr,
>>>>>>>>>>>> x0, x1, x2, x3);
>>>>>>>>>>>> - } catch (Throwable ex$) {
>>>>>>>>>>>> - throw new AssertionError("should not reach
>>>>>>>>>>>> here", ex$);
>>>>>>>>>>>> - }
>>>>>>>>>>>> - };
>>>>>>>>>>>> + class CacheHolder {
>>>>>>>>>>>> + static final Map<MemoryAddress,
>>>>>>>>>>>> fuse_fill_dir_t> CACHE = new ConcurrentHashMap<>();
>>>>>>>>>>>> + }
>>>>>>>>>>>> + return CacheHolder.CACHE.computeIfAbsent(addr,
>>>>>>>>>>>> addrK -> {
>>>>>>>>>>>> + final MethodHandle target =
>>>>>>>>>>>> MethodHandles.insertArguments(constants$0.fuse_fill_dir_t$MH,
>>>>>>>>>>>> 0, addrK);
>>>>>>>>>>>> + return (jdk.incubator.foreign.MemoryAddress
>>>>>>>>>>>> x0, jdk.incubator.foreign.MemoryAddress x1,
>>>>>>>>>>>> jdk.incubator.foreign.MemoryAddress x2, long x3) -> {
>>>>>>>>>>>> + try {
>>>>>>>>>>>> + return (int)target.invokeExact(x0, x1,
>>>>>>>>>>>> x2, x3);
>>>>>>>>>>>> + } catch (Throwable ex$) {
>>>>>>>>>>>> + throw new AssertionError("should not
>>>>>>>>>>>> reach here", ex$);
>>>>>>>>>>>> + }
>>>>>>>>>>>> + };
>>>>>>>>>>>> + });
>>>>>>>>>>>> }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>> (I hope these code blocks don't get mangled too much by
>>>>>>>>>>>> line wrapping)
>>>>>>>>>>>>
>>>>>>>>>>>> HTH,
>>>>>>>>>>>> Jorn
>>>>>>>>>>>>
>>>>>>>>>>>> On 09/07/2021 10:58, Sebastian Stenzel wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I wanted to share the results of a benchmark test, that
>>>>>>>>>>>>> includes several down- and upcalls. First, let me explain,
>>>>>>>>>>>>> what I'm testing here:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm working on a panama-based FUSE binding, mostly for
>>>>>>>>>>>>> experimental purposes right now, and I'm trying to beat
>>>>>>>>>>>>> fuse-jnr [1].
>>>>>>>>>>>>>
>>>>>>>>>>>>> While there are some other interesting metrics, such as
>>>>>>>>>>>>> read/write performance (both sequentially and random
>>>>>>>>>>>>> access), I focused on directory listings for now.
>>>>>>>>>>>>> Directory listings are the most complex operation in
>>>>>>>>>>>>> regards to the number of down- and upcalls:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. FUSE upcalls readdir and provides a callback function
>>>>>>>>>>>>> 2. java downcalls the callback for each item in the directory
>>>>>>>>>>>>> 3. FUSE upcalls getattr for each item (no longer required
>>>>>>>>>>>>> with "readdirplus" in FUSE 3.x)
>>>>>>>>>>>>> (4. I'm testing on macOS, which introduces additional
>>>>>>>>>>>>> noise (such as readxattr and trying to access files that I
>>>>>>>>>>>>> didn't report in readdir))
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, what I'm testing is essentially this:
>>>>>>>>>>>>> `Files.list(Path.of("/Volumes/foo")).close();` with the
>>>>>>>>>>>>> volume reporting eight files [2]. When mounting with debug
>>>>>>>>>>>>> logs enabled, I can see that the exact same operations in
>>>>>>>>>>>>> the same order are invoked on both fuse-jnr and
>>>>>>>>>>>>> fuse-panama. One single dir listing results in 2 readdir
>>>>>>>>>>>>> upcalls, 10 callback downcalls, 16 getattr upcalls. There
>>>>>>>>>>>>> are also 8 getxattr calls and 16 lookup calls, however
>>>>>>>>>>>>> they don't reach Java, as the FUSE kernel knows they are
>>>>>>>>>>>>> not implemented.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Long story short, here are the results:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> Benchmark Mode Cnt Score Error
>>>>>>>>>>>>> Units
>>>>>>>>>>>>> BenchmarkTest.testListDirJnr avgt 5 66,569 ±
>>>>>>>>>>>>> 3,128 us/op
>>>>>>>>>>>>> BenchmarkTest.testListDirPanama avgt 5 189,340 ±
>>>>>>>>>>>>> 4,275 us/op
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've been using panama snapshot at commit 42e03fd7c6a
>>>>>>>>>>>>> built with: `configure
>>>>>>>>>>>>> --with-boot-jdk=/Library/Java/JavaVirtualMachines/adoptopenjdk-16.jdk/Contents/Home/
>>>>>>>>>>>>> --with-native-debug-symbols=none
>>>>>>>>>>>>> --with-debug-level=release
>>>>>>>>>>>>> --with-libclang=/usr/local/opt/llvm
>>>>>>>>>>>>> --with-libclang-version=12`
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can't tell where this overhead comes from. Maybe
>>>>>>>>>>>>> creating a newConfinedScope() during each upcall [3] is
>>>>>>>>>>>>> "too much"? Maybe JNR is just negligently skipping some
>>>>>>>>>>>>> memory boundary checks to be faster. The results are not
>>>>>>>>>>>>> terrible, but I'd hoped for something better.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sebastian
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]https://urldefense.com/v3/__https://github.com/SerCeMan/jnr-fuse__;!!ACWV5N9M2RV99hQ!deSMwGndYEdMIZ2Fn4rLom81ulNtUdkK-4zBkp_0YUNnjKszGqmKu404ru2DZGfZKIAfyrY$
>>>>>>>>>>>>> <https://urldefense.com/v3/__https://github.com/SerCeMan/jnr-fuse__;!!ACWV5N9M2RV99hQ!deSMwGndYEdMIZ2Fn4rLom81ulNtUdkK-4zBkp_0YUNnjKszGqmKu404ru2DZGfZKIAfyrY$
>>>>>>>>>>>>> >
>>>>>>>>>>>>> [2]https://urldefense.com/v3/__https://github.com/skymatic/fuse-panama/blob/develop/src/test/java/de/skymatic/fusepanama/examples/HelloPanamaFileSystem.java*L139-L146__;Iw!!ACWV5N9M2RV99hQ!deSMwGndYEdMIZ2Fn4rLom81ulNtUdkK-4zBkp_0YUNnjKszGqmKu404ru2DZGfZrbEQfzQ$
>>>>>>>>>>>>> <https://urldefense.com/v3/__https://github.com/skymatic/fuse-panama/blob/develop/src/test/java/de/skymatic/fusepanama/examples/HelloPanamaFileSystem.java*L139-L146__;Iw!!ACWV5N9M2RV99hQ!deSMwGndYEdMIZ2Fn4rLom81ulNtUdkK-4zBkp_0YUNnjKszGqmKu404ru2DZGfZrbEQfzQ$
>>>>>>>>>>>>> >
>>>>>>>>>>>>> [3]https://urldefense.com/v3/__https://github.com/skymatic/fuse-panama/blob/769347575863861063a2347a42b2cbaadb5eacef/src/main/java/de/skymatic/fusepanama/FuseOperations.java*L67-L71__;Iw!!ACWV5N9M2RV99hQ!deSMwGndYEdMIZ2Fn4rLom81ulNtUdkK-4zBkp_0YUNnjKszGqmKu404ru2DZGfZ9Xy3UhQ$
>>>>>>>>>>>>> <https://urldefense.com/v3/__https://github.com/skymatic/fuse-panama/blob/769347575863861063a2347a42b2cbaadb5eacef/src/main/java/de/skymatic/fusepanama/FuseOperations.java*L67-L71__;Iw!!ACWV5N9M2RV99hQ!deSMwGndYEdMIZ2Fn4rLom81ulNtUdkK-4zBkp_0YUNnjKszGqmKu404ru2DZGfZ9Xy3UhQ$
>>>>>>>>>>>>> >
>>
>
More information about the panama-dev
mailing list