[OpenJDK 2D-Dev] RFR: 8263482: Make access to the ICC color profiles data multithread-friendly
Sergey Bylokhov
serb at openjdk.java.net
Wed Mar 17 23:00:49 UTC 2021
On Wed, 17 Mar 2021 20:41:47 GMT, Alexander Zvegintsev <azvegint at openjdk.org> wrote:
>> FYI: probably is better/simpler to review it via webrev.
>>
>> After migration to the lcms from the kcms the performance of some operations was regressed. One possible workaround was to split the operation into multiple threads. But unfortunately, we have too many bottlenecks which prevent using multithreading. This is the request to remove/minimize such bottlenecks(at least some of them), but it does not affect the single-threaded performance it should be handled separately.
>>
>> The main code pattern optimized here is this:
>> activate();
>> byte[] theHeader = getData(cmmProfile, icSigHead);
>> ----> CMSManager.getModule().getTagData(p, tagSignature);
>> Notes about the code above:
>>
>> 1. Before the change "activate()" method checked that the "cmmProfile" field was not null. After that we usually used the "cmmProfile" as the parameter to some other method. This included two volatile reads, and also required to check when we need to call the "activate()" method before usage of the "cmmProfile" field.
>> Solution: "activate()" renamed to the "cmmProfile()" which became an accessor for the field, so we will get one volatile read and can easily monitor the usage of the field itself(it is used directly only in this method).
>>
>> 2. The synchronized static method "CMSManager.getModule()" reimplemented to have only one volatile read.
>>
>> 3. The usage of locking in the "getTagData()" is changed. Instead of the synchronized instance methods, we now use the mix of "ConcurrentHashMap" and StampedLock.
>>
>> See some comments inline.
>>
>> Some numbers(small numbers are better):
>>
>> 1. Performance of ((ICC_ProfileRGB) ICC_Profile.getInstance(ColorSpace.CS_sRGB)).getMatrix();
>>
>> jdk 15.0.2
>> Benchmark Mode Cnt Score Error Units
>> CMMPerf.ThreadsMAX.testGetMatrix avgt 5 19,624 ± 0,059 us/op
>> CMMPerf.testGetMatrix avgt 5 0,154 ± 0,001 us/op
>>
>> jdk - before the fix
>> Benchmark Mode Cnt Score Error Units
>> CMMPerf.ThreadsMAX.testGetMatrix avgt 5 12,935 ± 0,042 us/op
>> CMMPerf.testGetMatrix avgt 5 0,127 ± 0,007 us/op
>>
>> jdk - after the fix
>> Benchmark Mode Cnt Score Error Units
>> CMMPerf.ThreadsMAX.testGetMatrix avgt 5 0,561 ± 0,005 us/op
>> CMMPerf.testGetMatrix avgt 5 0,092 ± 0,001 us/op
>>
>> 2. Part of performance gain in jdk17 is from some other fixes, for example
>> Performance of ICC_Profile.getInstance(ColorSpace.CS_sRGB); and ColorSpace.getInstance(ColorSpace.CS_sRGB);
>>
>> jdk 15.0.2
>> Benchmark Mode Cnt Score Error Units
>> CMMPerf.ThreadsMAX.testGetSRGBProfile avgt 5 2,299 ± 0,032 us/op
>> CMMPerf.ThreadsMAX.testGetSRGBSpace avgt 5 2,210 ± 0,051 us/op
>> CMMPerf.testGetSRGBProfile avgt 5 0,019 ± 0,001 us/op
>> CMMPerf.testGetSRGBSpace avgt 5 0,018 ± 0,001 us/op
>>
>> jdk - same before/ after the fix
>> Benchmark Mode Cnt Score Error Units
>> CMMPerf.ThreadsMAX.testGetSRGBProfile avgt 5 0,005 ± 0,001 us/op
>> CMMPerf.ThreadsMAX.testGetSRGBSpace avgt 5 0,005 ± 0,001 us/op
>> CMMPerf.testGetSRGBProfile avgt 5 0,005 ± 0,001 us/op
>> CMMPerf.testGetSRGBSpace avgt 5 0,005 ± 0,001 us/op
>>
>> note "ThreadsMAX" is 32 threads.
>
> src/java.desktop/share/native/liblcms/LCMS.c line 644:
>
>> 642: return cmmProfile;
>> 643: }
>> 644: return NULL;
>
> Why do we need to do this from native code? (except easing of access to a private method of a class in another package.)
> Will it give some noticeable performance boost if we implement it on java side?
Yes, this is the only reason.
I have a todo to check what access will be better, AWTAccessor/methodhandle/reflection vs jni.
-------------
PR: https://git.openjdk.java.net/jdk/pull/2957
More information about the 2d-dev
mailing list