[foreign-memaccess+abi] RFR: Add support for high-level functions to copy to and from Java arrays [v4]
Uwe Schindler
uschindler at openjdk.java.net
Tue Jun 22 14:19:45 UTC 2021
On Tue, 22 Jun 2021 12:37:17 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
>> This patch includes some of the changes from Lee to support a set of static functions to allow bulk copy from Java arrays to segments (and viceversa) in a more succint fashion (so that the user doesn't have to create a temporary heap segment to do the copy).
>>
>> I've added a new public method to `MemorySegment` which performs an *element-wise* bulk copy; it takes a source segment and a couple of element layouts: the source element layout and the destination element layout. The two layouts must have same size, but can have different alignments (which will be checked against the corresponding segments) and byte orders. If the byte order differs, a bulk copy with swap will be performed. As such, this method generalizes the previous `copyFrom` - as follows:
>>
>>
>> copyFrom(srcSegment) -> copyFrom(JAVA_BYTE, srcSegment, JAVA_BYTE)
>>
>> I've added support for argument type profiling for MemoryCopy static methods to avoid type pollution in cases where same metod is called with different memory segment types.
>>
>> I've done a pass over the javadoc, and make it more consistent with the rest of the API. I've also reworked the test a bit to use the data provider functionality of TestNG, since all the test cases were similar, except for the carrier type.
>>
>> There are other cosmetic changes as well, compared to original code from Lee, such as naming of static fields which is now capitalized. Everything else is the same.
>
> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix wrong polarity of readOnly in copyTo methods
Hi,
I can confirm. The garbage collection issues are gone. The heap profile of Lucene looks identical to our old ByteBuffer-based implementation:
PERCENT HEAP SAMPLES STACK
16.78% 4781M org.apache.lucene.util.FixedBitSet#<init>()
10.17% 2897M org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
8.51% 2424M java.util.AbstractList#iterator()
7.67% 2184M org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
4.37% 1245M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.46% 985M org.apache.lucene.queryparser.charstream.FastCharStream#refill()
3.11% 885M org.apache.lucene.util.ArrayUtil#growExact()
2.93% 834M java.util.ArrayList#grow()
2.37% 675M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
2.14% 609M org.apache.lucene.util.BytesRef#<init>()
1.76% 502M java.util.ArrayList#iterator()
1.73% 491M org.apache.lucene.util.fst.ByteSequenceOutputs#read()
1.52% 432M org.apache.lucene.util.PriorityQueue#<init>()
1.49% 425M jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.30% 369M org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.27% 362M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.26% 359M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.20% 342M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.10% 312M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
1.03% 292M org.apache.lucene.store.MemorySegmentIndexInput#buildSlice()
0.99% 283M org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
0.98% 279M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.98% 279M org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.94% 267M org.apache.lucene.util.DocIdSetBuilder$Buffer#<init>()
0.90% 257M java.util.AbstractList#listIterator()
0.88% 250M org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#<init>()
0.81% 231M jdk.internal.foreign.MappedMemorySegmentImpl#dup()
0.80% 229M org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.77% 219M java.util.Arrays#copyOfRange()
0.69% 198M java.util.Arrays#asList()
ByteBuffer profile:
PERCENT HEAP SAMPLES STACK
16.44% 4774M org.apache.lucene.util.FixedBitSet#<init>()
9.88% 2870M org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
8.34% 2421M java.util.AbstractList#iterator()
7.42% 2155M org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
4.40% 1278M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.30% 959M org.apache.lucene.util.ArrayUtil#growExact()
3.09% 896M java.util.ArrayList#grow()
2.89% 839M org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.26% 655M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
1.98% 573M org.apache.lucene.util.BytesRef#<init>()
1.91% 553M java.util.ArrayList#iterator()
1.69% 490M org.apache.lucene.util.fst.ByteSequenceOutputs#read()
1.54% 448M jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.53% 444M java.nio.DirectByteBufferR#duplicate()
1.52% 440M org.apache.lucene.util.PriorityQueue#<init>()
1.42% 411M org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.28% 371M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.25% 362M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.23% 356M org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
1.20% 348M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.00% 289M org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
0.92% 268M java.util.AbstractList#listIterator()
0.91% 265M org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#<init>()
0.91% 264M org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.89% 257M java.util.Arrays#copyOf()
0.82% 238M org.apache.lucene.util.DocIdSetBuilder$Buffer#<init>()
0.82% 237M org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.79% 230M org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.77% 224M java.nio.DirectByteBufferR#slice()
0.70% 202M java.util.Arrays#asList()
As you see we have a few segment slices/dups, but similar for MappedByteBuffers (this comes from the fact that Lucene sometimes opens views on compound files.
The performance is still a little bit slower for some query types (mainly facets), but I have the feeling this comes from the overhead when coyping small arrays, as well as scoping checks.
@mcimadamore: You mentioned that for small arrays, copy-loops may fit better. Do you have any suggestions what loop sizes we are talking about. Lucene's `long[]` are safely sizes <= 64 and `float[]` of sizes <= 1024 elements, so maybe it's a good idea to just use the default read loop and not specialize it to do bulk copy.
Here are Lucene's results:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
BrowseMonthTaxoFacets 1.17 (5.7%) 1.07 (6.4%) -8.6% ( -19% - 3%) 0.000
BrowseDayOfYearTaxoFacets 1.13 (7.1%) 1.04 (7.8%) -7.9% ( -21% - 7%) 0.001
BrowseDateTaxoFacets 1.13 (7.1%) 1.04 (8.0%) -7.7% ( -21% - 7%) 0.001
HighTermTitleBDVSort 54.84 (20.1%) 50.84 (15.8%) -7.3% ( -35% - 35%) 0.202
PKLookup 188.75 (2.2%) 181.26 (1.4%) -4.0% ( -7% - 0%) 0.000
AndHighHigh 61.16 (6.1%) 59.56 (5.6%) -2.6% ( -13% - 9%) 0.158
HighTermDayOfYearSort 63.43 (10.8%) 62.28 (14.2%) -1.8% ( -24% - 26%) 0.650
Fuzzy2 57.80 (5.1%) 56.81 (5.1%) -1.7% ( -11% - 8%) 0.288
HighTermMonthSort 66.31 (11.8%) 65.22 (17.0%) -1.7% ( -27% - 30%) 0.720
Respell 68.71 (2.4%) 67.71 (2.1%) -1.5% ( -5% - 3%) 0.041
HighPhrase 10.97 (4.8%) 10.83 (6.2%) -1.3% ( -11% - 10%) 0.468
AndHighMed 44.63 (4.4%) 44.09 (4.2%) -1.2% ( -9% - 7%) 0.370
TermDTSort 105.50 (17.0%) 104.58 (19.8%) -0.9% ( -32% - 43%) 0.880
OrHighMed 34.38 (3.5%) 34.14 (3.9%) -0.7% ( -7% - 6%) 0.556
OrHighLow 337.44 (3.1%) 335.46 (4.6%) -0.6% ( -8% - 7%) 0.634
MedPhrase 179.30 (2.7%) 178.40 (4.1%) -0.5% ( -7% - 6%) 0.646
AndHighLow 369.54 (2.9%) 367.78 (3.3%) -0.5% ( -6% - 5%) 0.628
Fuzzy1 52.74 (9.8%) 52.49 (9.2%) -0.5% ( -17% - 20%) 0.874
OrHighHigh 12.03 (3.8%) 11.97 (3.7%) -0.4% ( -7% - 7%) 0.717
IntNRQ 73.05 (1.5%) 72.78 (2.5%) -0.4% ( -4% - 3%) 0.569
MedSloppyPhrase 25.01 (1.9%) 24.98 (1.7%) -0.1% ( -3% - 3%) 0.856
Wildcard 76.65 (21.9%) 76.60 (21.5%) -0.1% ( -35% - 55%) 0.993
HighIntervalsOrdered 10.04 (4.3%) 10.03 (5.5%) -0.0% ( -9% - 10%) 0.979
LowSloppyPhrase 39.07 (2.3%) 39.08 (2.3%) 0.0% ( -4% - 4%) 0.970
BrowseDayOfYearSSDVFacets 4.00 (4.6%) 4.00 (5.6%) 0.1% ( -9% - 10%) 0.972
LowSpanNear 51.01 (2.0%) 51.09 (1.6%) 0.2% ( -3% - 3%) 0.784
LowPhrase 265.84 (2.0%) 266.25 (2.9%) 0.2% ( -4% - 5%) 0.845
MedTerm 1532.72 (4.0%) 1535.72 (4.5%) 0.2% ( -7% - 9%) 0.884
HighTerm 1382.84 (5.2%) 1388.91 (4.9%) 0.4% ( -9% - 11%) 0.784
OrHighNotMed 851.21 (3.9%) 856.51 (3.6%) 0.6% ( -6% - 8%) 0.599
OrNotHighLow 772.85 (3.2%) 778.22 (3.6%) 0.7% ( -5% - 7%) 0.522
BrowseMonthSSDVFacets 4.06 (6.3%) 4.09 (6.5%) 0.8% ( -11% - 14%) 0.692
OrHighNotLow 726.91 (4.7%) 732.78 (5.0%) 0.8% ( -8% - 10%) 0.596
MedSpanNear 101.32 (3.0%) 102.16 (2.7%) 0.8% ( -4% - 6%) 0.354
HighSpanNear 0.90 (3.0%) 0.91 (2.9%) 0.9% ( -4% - 6%) 0.364
HighSloppyPhrase 13.89 (4.6%) 14.02 (4.4%) 0.9% ( -7% - 10%) 0.509
OrNotHighMed 597.86 (3.0%) 604.63 (3.8%) 1.1% ( -5% - 8%) 0.298
LowTerm 1874.30 (3.2%) 1898.96 (4.5%) 1.3% ( -6% - 9%) 0.287
Prefix3 38.02 (8.6%) 38.53 (10.1%) 1.3% ( -15% - 21%) 0.655
OrNotHighHigh 719.81 (3.8%) 731.82 (5.3%) 1.7% ( -7% - 11%) 0.252
OrHighNotHigh 723.65 (4.0%) 736.28 (3.7%) 1.7% ( -5% - 9%) 0.153
-------------
PR: https://git.openjdk.java.net/panama-foreign/pull/555
More information about the panama-dev
mailing list