[foreign-memaccess+abi] RFR: Add support for high-level functions to copy to and from Java arrays [v4]

Tue Jun 22 14:19:45 UTC 2021

On Tue, 22 Jun 2021 12:37:17 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:

>> This patch includes some of the changes from Lee to support a set of static functions to allow bulk copy from Java arrays to segments (and viceversa) in a more succint fashion (so that the user doesn't have to create a temporary heap segment to do the copy).
>> 
>> I've added a new public method to `MemorySegment` which performs an *element-wise* bulk copy; it takes a source segment and a couple of element layouts: the source element layout and the destination element layout. The two layouts must have same size, but can have different alignments (which will be checked against the corresponding segments) and byte orders. If the byte order differs, a bulk copy with swap will be performed. As such, this method generalizes the previous `copyFrom` - as follows:
>> 
>> 
>> copyFrom(srcSegment) -> copyFrom(JAVA_BYTE, srcSegment, JAVA_BYTE)
>> 
>> I've added support for argument type profiling for MemoryCopy static methods to avoid type pollution in cases where same metod is called with different memory segment types.
>> 
>> I've done a pass over the javadoc, and make it more consistent with the rest of the API. I've also reworked the test a bit to use the data provider functionality of TestNG, since all the test cases were similar, except for the carrier type.
>> 
>> There are other cosmetic changes as well, compared to original code from Lee, such as naming of static fields which is now capitalized. Everything else is the same.
>
> Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix wrong polarity of readOnly in copyTo methods

Hi,

I can confirm. The garbage collection issues are gone. The heap profile of Lucene looks identical to our old ByteBuffer-based implementation:

PERCENT       HEAP SAMPLES  STACK
16.78%        4781M         org.apache.lucene.util.FixedBitSet#<init>()
10.17%        2897M         org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
8.51%         2424M         java.util.AbstractList#iterator()
7.67%         2184M         org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
4.37%         1245M         org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.46%         985M          org.apache.lucene.queryparser.charstream.FastCharStream#refill()
3.11%         885M          org.apache.lucene.util.ArrayUtil#growExact()
2.93%         834M          java.util.ArrayList#grow()
2.37%         675M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
2.14%         609M          org.apache.lucene.util.BytesRef#<init>()
1.76%         502M          java.util.ArrayList#iterator()
1.73%         491M          org.apache.lucene.util.fst.ByteSequenceOutputs#read()
1.52%         432M          org.apache.lucene.util.PriorityQueue#<init>()
1.49%         425M          jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.30%         369M          org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.27%         362M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.26%         359M          org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.20%         342M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.10%         312M          org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
1.03%         292M          org.apache.lucene.store.MemorySegmentIndexInput#buildSlice()
0.99%         283M          org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
0.98%         279M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.98%         279M          org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.94%         267M          org.apache.lucene.util.DocIdSetBuilder$Buffer#<init>()
0.90%         257M          java.util.AbstractList#listIterator()
0.88%         250M          org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#<init>()
0.81%         231M          jdk.internal.foreign.MappedMemorySegmentImpl#dup()
0.80%         229M          org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.77%         219M          java.util.Arrays#copyOfRange()
0.69%         198M          java.util.Arrays#asList()

ByteBuffer profile:

PERCENT       HEAP SAMPLES  STACK
16.44%        4774M         org.apache.lucene.util.FixedBitSet#<init>()
9.88%         2870M         org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
8.34%         2421M         java.util.AbstractList#iterator()
7.42%         2155M         org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
4.40%         1278M         org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame#<init>()
3.30%         959M          org.apache.lucene.util.ArrayUtil#growExact()
3.09%         896M          java.util.ArrayList#grow()
2.89%         839M          org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.26%         655M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#<init>()
1.98%         573M          org.apache.lucene.util.BytesRef#<init>()
1.91%         553M          java.util.ArrayList#iterator()
1.69%         490M          org.apache.lucene.util.fst.ByteSequenceOutputs#read()
1.54%         448M          jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.53%         444M          java.nio.DirectByteBufferR#duplicate()
1.52%         440M          org.apache.lucene.util.PriorityQueue#<init>()
1.42%         411M          org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#load()
1.28%         371M          org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#getFrame()
1.25%         362M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#newTermState()
1.23%         356M          org.apache.lucene.queryparser.charstream.FastCharStream#GetImage()
1.20%         348M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#<init>()
1.00%         289M          org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum#<init>()
0.92%         268M          java.util.AbstractList#listIterator()
0.91%         265M          org.apache.lucene.codecs.lucene90.blocktree.IntersectTermsEnumFrame#<init>()
0.91%         264M          org.apache.lucene.codecs.lucene90.ForUtil#<init>()
0.89%         257M          java.util.Arrays#copyOf()
0.82%         238M          org.apache.lucene.util.DocIdSetBuilder$Buffer#<init>()
0.82%         237M          org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#<init>()
0.79%         230M          org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsReader$BlockState#document()
0.77%         224M          java.nio.DirectByteBufferR#slice()
0.70%         202M          java.util.Arrays#asList()

As you see we have a few segment slices/dups, but similar for MappedByteBuffers (this comes from the fact that Lucene sometimes opens views on compound files.

The performance is still a little bit slower for some query types (mainly facets), but I have the feeling this comes from the overhead when coyping small arrays, as well as scoping checks.

@mcimadamore: You mentioned that for small arrays, copy-loops may fit better. Do you have any suggestions what loop sizes we are talking about. Lucene's `long[]` are safely sizes <= 64 and `float[]` of sizes <= 1024 elements, so maybe it's a good idea to just use the default read loop and not specialize it to do bulk copy.

Here are Lucene's results:

                    TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
   BrowseMonthTaxoFacets        1.17      (5.7%)        1.07      (6.4%)   -8.6% ( -19% -    3%) 0.000
BrowseDayOfYearTaxoFacets        1.13      (7.1%)        1.04      (7.8%)   -7.9% ( -21% -    7%) 0.001
    BrowseDateTaxoFacets        1.13      (7.1%)        1.04      (8.0%)   -7.7% ( -21% -    7%) 0.001
    HighTermTitleBDVSort       54.84     (20.1%)       50.84     (15.8%)   -7.3% ( -35% -   35%) 0.202
                PKLookup      188.75      (2.2%)      181.26      (1.4%)   -4.0% (  -7% -    0%) 0.000
             AndHighHigh       61.16      (6.1%)       59.56      (5.6%)   -2.6% ( -13% -    9%) 0.158
   HighTermDayOfYearSort       63.43     (10.8%)       62.28     (14.2%)   -1.8% ( -24% -   26%) 0.650
                  Fuzzy2       57.80      (5.1%)       56.81      (5.1%)   -1.7% ( -11% -    8%) 0.288
       HighTermMonthSort       66.31     (11.8%)       65.22     (17.0%)   -1.7% ( -27% -   30%) 0.720
                 Respell       68.71      (2.4%)       67.71      (2.1%)   -1.5% (  -5% -    3%) 0.041
              HighPhrase       10.97      (4.8%)       10.83      (6.2%)   -1.3% ( -11% -   10%) 0.468
              AndHighMed       44.63      (4.4%)       44.09      (4.2%)   -1.2% (  -9% -    7%) 0.370
              TermDTSort      105.50     (17.0%)      104.58     (19.8%)   -0.9% ( -32% -   43%) 0.880
               OrHighMed       34.38      (3.5%)       34.14      (3.9%)   -0.7% (  -7% -    6%) 0.556
               OrHighLow      337.44      (3.1%)      335.46      (4.6%)   -0.6% (  -8% -    7%) 0.634
               MedPhrase      179.30      (2.7%)      178.40      (4.1%)   -0.5% (  -7% -    6%) 0.646
              AndHighLow      369.54      (2.9%)      367.78      (3.3%)   -0.5% (  -6% -    5%) 0.628
                  Fuzzy1       52.74      (9.8%)       52.49      (9.2%)   -0.5% ( -17% -   20%) 0.874
              OrHighHigh       12.03      (3.8%)       11.97      (3.7%)   -0.4% (  -7% -    7%) 0.717
                  IntNRQ       73.05      (1.5%)       72.78      (2.5%)   -0.4% (  -4% -    3%) 0.569
         MedSloppyPhrase       25.01      (1.9%)       24.98      (1.7%)   -0.1% (  -3% -    3%) 0.856
                Wildcard       76.65     (21.9%)       76.60     (21.5%)   -0.1% ( -35% -   55%) 0.993
    HighIntervalsOrdered       10.04      (4.3%)       10.03      (5.5%)   -0.0% (  -9% -   10%) 0.979
         LowSloppyPhrase       39.07      (2.3%)       39.08      (2.3%)    0.0% (  -4% -    4%) 0.970
BrowseDayOfYearSSDVFacets        4.00      (4.6%)        4.00      (5.6%)    0.1% (  -9% -   10%) 0.972
             LowSpanNear       51.01      (2.0%)       51.09      (1.6%)    0.2% (  -3% -    3%) 0.784
               LowPhrase      265.84      (2.0%)      266.25      (2.9%)    0.2% (  -4% -    5%) 0.845
                 MedTerm     1532.72      (4.0%)     1535.72      (4.5%)    0.2% (  -7% -    9%) 0.884
                HighTerm     1382.84      (5.2%)     1388.91      (4.9%)    0.4% (  -9% -   11%) 0.784
            OrHighNotMed      851.21      (3.9%)      856.51      (3.6%)    0.6% (  -6% -    8%) 0.599
            OrNotHighLow      772.85      (3.2%)      778.22      (3.6%)    0.7% (  -5% -    7%) 0.522
   BrowseMonthSSDVFacets        4.06      (6.3%)        4.09      (6.5%)    0.8% ( -11% -   14%) 0.692
            OrHighNotLow      726.91      (4.7%)      732.78      (5.0%)    0.8% (  -8% -   10%) 0.596
             MedSpanNear      101.32      (3.0%)      102.16      (2.7%)    0.8% (  -4% -    6%) 0.354
            HighSpanNear        0.90      (3.0%)        0.91      (2.9%)    0.9% (  -4% -    6%) 0.364
        HighSloppyPhrase       13.89      (4.6%)       14.02      (4.4%)    0.9% (  -7% -   10%) 0.509
            OrNotHighMed      597.86      (3.0%)      604.63      (3.8%)    1.1% (  -5% -    8%) 0.298
                 LowTerm     1874.30      (3.2%)     1898.96      (4.5%)    1.3% (  -6% -    9%) 0.287
                 Prefix3       38.02      (8.6%)       38.53     (10.1%)    1.3% ( -15% -   21%) 0.655
           OrNotHighHigh      719.81      (3.8%)      731.82      (5.3%)    1.7% (  -7% -   11%) 0.252
           OrHighNotHigh      723.65      (4.0%)      736.28      (3.7%)    1.7% (  -5% -    9%) 0.153

-------------

PR: https://git.openjdk.java.net/panama-foreign/pull/555