RFR: 8238954: Improve performance of tiled snapshot rendering
Frederic Thevenet
github.com+7450507+fthevenet at openjdk.java.net
Tue Mar 10 11:16:52 UTC 2020
On Tue, 10 Mar 2020 10:29:38 GMT, Frederic Thevenet <github.com+7450507+fthevenet at openjdk.org> wrote:
>> ### 14-internal
>>
>> --------
>> | | 1024 |2048 |3072 |4096 |5120 |6144 |7168 |8192 |9216 |
>> |---|---|---|---|---|---|---|---|---|---|
>> | 1024 | 5.740508 | 9.337537 | 13.489849 | 17.611105 | 38.898909 | 48.165735 | 53.596876 | 49.449740 | 66.032570 |
>> | 2048 | 9.845097 | 17.799415 | 26.109529 | 34.607728 | 79.345622 | 94.082500 | 107.777644 | 100.901349 | 135.826890 |
>> | 3072 | 14.654498 | 26.183649 | 39.781191 | 51.871491 | 113.010307 | 143.613631 | 184.883820 | 167.076202 | 200.852633
>> | | 4096 | 18.706278 | 36.115871 | 51.477296 | 68.457649 | 156.240888 | 186.159272 | 222.876505 | 237.387683 |
>> 290.125942 | | 5120 | 50.566276 | 106.465632 | 140.506406 | 161.687151 | 203.644875 | 237.260330 | 279.108632 |
>> 311.002566 | 371.704115 | | 6144 | 53.501341 | 106.726656 | 160.191733 | 216.969484 | 264.996201 | 287.375425 |
>> 335.294473 | 365.035267 | 419.995978 | | 7168 | 66.422026 | 110.882355 | 187.978455 | 239.014528 | 308.817056 |
>> 335.838550 | 394.270828 | 445.987300 | 506.974069 | | 8192 | 60.315442 | 108.770069 | 164.424088 | 205.330331 |
>> 305.201833 | 343.846336 | 392.867668 | 454.540147 | 503.808112 | | 9216 | 71.070811 | 132.708328 | 188.411172 |
>> 256.130225 | 320.028449 | 400.748559 | 471.542252 | 595.355103 | 589.240851 |
>> 
>
> ### 15-internal:
>
> --------
> | | 1024 |2048 |3072 |4096 |5120 |6144 |7168 |8192 |9216 |
> |---|---|---|---|---|---|---|---|---|---|
> | 1024 | 5.381051 | 9.261115 | 14.033219 | 20.608201 | 26.159817 | 33.599632 | 36.669261 | 43.042338 | 46.086088 |
> | 2048 | 9.752862 | 17.698869 | 27.004541 | 38.437578 | 52.297443 | 60.757880 | 68.101838 | 80.162117 | 93.852856 |
> | 3072 | 15.564961 | 27.304138 | 40.255866 | 56.636476 | 80.472402 | 86.346635 | 105.154089 | 121.048263 | 130.458981 |
> | 4096 | 19.436113 | 35.556343 | 53.277865 | 71.623899 | 95.814932 | 122.543003 | 136.833771 | 160.199834 | 178.356125 |
> | 5120 | 27.246498 | 65.875784 | 73.171492 | 103.380029 | 126.486761 | 147.666102 | 165.833885 | 199.005331 |
> 220.659671 | | 6144 | 31.843301 | 62.101937 | 93.646729 | 125.531512 | 150.914608 | 175.553034 | 209.835003 |
> 241.114596 | 253.512648 | | 7168 | 40.507918 | 70.843435 | 101.075064 | 137.284040 | 165.808501 | 197.015259 |
> 254.286955 | 304.928104 | 299.992601 | | 8192 | 43.206941 | 80.290957 | 121.946965 | 157.016439 | 193.509481 |
> 243.514969 | 268.151933 | 359.562281 | 352.102850 | | 9216 | 49.529493 | 90.895186 | 149.422784 | 179.512616 |
> 217.260338 | 267.610592 | 309.706685 | 354.950852 | 383.275751 |
> 
I've uploaded 3 sets of results, from 3 different implementations:
1. **14-ea+9** is the implementation merged into openjfx14 following #68; it only use tiling if the original
implementation fails; on Windows that would typically be when the snapshot dimensions are larger than 8192 pixels.
2. **14-internal** uses the same tiling implementation than the above, but start using tiling as soon as snapshot
dimensions are larger than `PrismSettings.maxTextureSize`; i.e. typically 4096 pixels. NB: This implementation was
never merged into openJFX; results are only provided for comparison's sake.
3. **15-internal** uses the tiling implementation proposed in the PR. Compared to the ones above, it attempt to align
pixel formats to avoid the cost of transformation from one format to another (e.g. ByteBRGA to IntARGB) and it tries to
divide up the final snapshots into tiles of the same dimensions to prevent creating a new GPU surface for every tile.
**Please note that these results are for the best possible scenario with regard to the above optimizations; in the worst
case scenario (a user provided image with a different pixel format and no way to divide the snapshot into equal size
tiles), then the performances are the same as that of implementation 2**
My conclusions from the above results are twofold:
- Tiling has a cost in terms of performance, but as far as I can tell it remains the only practical way to work around
the underlying issue (i.e. taking snapshot larger than the supported texture size) and the optimizations proposed in
this PR arguably do mitigate that cost, even if it is only true in some of the cases.
- The original implementation, which is preserved in 14-ea+9, ignores texture clamping to 4096 which means it doesn't has
to resort to tiling until the target snapshot size is >8192 on d3d or 16384 on es2.
If this is an incorrect behaviour on the part of the original implementation (which I'm led to believe is the case), it
gives it an unfair advantage in this benchmark, i.e. it is faster because is does things it shouldn't do. If however it
turns out it is actually safe to ignore clamping when taking snapshot, then it would make sense to do so in the
implementation proposed by this PR as well.
-------------
PR: https://git.openjdk.java.net/jfx/pull/112
More information about the openjfx-dev
mailing list