<!DOCTYPE html><html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <blockquote type="cite" cite="mid:CAL4QsguYkBshtgiD4v2MsnirbOsm+Lbaw6KSAc7pryopN6=z+w@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote gmail_quote_container">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p>The fast paths in StringSupport call an out-of-line

                stub that does a vectorized copy. At least in theory

                C2's auto-vectorizer should be able to do the exact same

                thing for a manual loop using charAt, but inline. i.e.

                it might even be faster, especially for small strings.

                That's why it would be good to try that approach and see

                how it compares.<br>

              </p>

            </div>

          </blockquote>

          <div>I can take a closer look at this. To check my

            understanding, would you expect it to be competitive for

            UTF-16, or also UTF-8?</div>

        </div>

      </div>

    </blockquote>

    Either should work, though the UTF-16 code for expanding to a char

    is more complex, so the vectorizer's pattern matching might fail

    there. The code for UTF-8 (well, really latin1) is much simpler

    though (just a plain array load), so that one is more likely to work

    out of the two.

    <blockquote type="cite" cite="mid:CAL4QsguYkBshtgiD4v2MsnirbOsm+Lbaw6KSAc7pryopN6=z+w@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote gmail_quote_container">

          <div> For the UTF-8 case, would you expect something like what

            proto is currently doing here [1] to get vectorized?</div>

          <div><br>

          </div>

          <div>[1] <a href="https://urldefense.com/v3/__https://github.com/protocolbuffers/protobuf/blob/0a727cfc6e0a6dbeb46716f2f6142b99b6a604e0/java/core/src/main/java/com/google/protobuf/Utf8.java*L939-L990__;Iw!!ACWV5N9M2RV99hQ!OrxYNs2e8L35GrenrzSEvBmcp98_kc6dNk3fRY6NXkidCTXGY9QzRptWKz1YLh7-khqCsK4IDtwfbEiv$" moz-do-not-send="true">https://github.com/protocolbuffers/protobuf/blob/0a727cfc6e0a6dbeb46716f2f6142b99b6a604e0/java/core/src/main/java/com/google/protobuf/Utf8.java#L939-L990</a></div>

        </div>

      </div>

    </blockquote>

    <p>This doesn't look like something that would vectorize. Typically,

      any non-loop-invariant control flow you have in a loop body will

      inhibit vectorization.</p>

    <blockquote type="cite" cite="mid:CAL4QsguYkBshtgiD4v2MsnirbOsm+Lbaw6KSAc7pryopN6=z+w@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote gmail_quote_container">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p> I was thinking primarily along the lines of adding a

                MemorySegment::copy overload that accepts Strings as a

                source (as opposed to e.g. an array), for copying from a

                string to a memory segment only. We should probably also

                add an overload to SegmentAllocator::allocateFrom that

                accepts an offset and a length (we already have two for

                full strings). These two overloads could fully support

                the sub string use case without looking too out of

                place. </p>

            </div>

          </blockquote>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p>For reading a String, I think your proposal to augment

                MemorySegment::getString looks good, but I think we

                should leave setString alone in favor of adding a

                MS::copy overload (there's the asymmetry I was talking

                about before). </p>

            </div>

          </blockquote>

          <div>Thanks, I think I understand better now. Using copy for

            this seems a lot nicer than setStringWithoutNullTerminator.</div>

          <div> </div>

          For the allocateFrom part, do you think it would make sense to

          pass the offset/length all the way through

          bytesCompatible/copyToSegmentRaw? That could be decided with

          benchmarks, and also potentially done later with the same

          allocateFrom API shape if it ended up being worthwhile.</div>

      </div>

    </blockquote>

    I think it should work similar to the overload we have with

    MemorySegment as a source: i.e. just call allocateNoInit, and then

    delegate to MemorySegment::copy.

    <blockquote type="cite" cite="mid:CAL4QsguYkBshtgiD4v2MsnirbOsm+Lbaw6KSAc7pryopN6=z+w@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote gmail_quote_container">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p>For completeness, I think we should also just add the

                MemorySegment::ofString(String, CharSet) overload which

                tries to return a read-only view of the string, to match

                the existing ofArray methods. This seems generally just

                a good primitive to have.</p>

            </div>

          </blockquote>

          <div>That sounds good to me.</div>

          <div><br>

          </div>

          <div>Do you have thoughts on the best way to proceed here? Do

            you think it makes sense to do incrementally, or would you

            prefer to see all of these related changes happen together

            under a single issue?<br>

            <br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>I don't have a preference. Since you've already started a PR for

      enhancing getString, maybe you can focus on that for now, and

      we'll file followup issues for the others. Splitting things up

      might be nice since there's probably some benchmarking work

      involved for each. I think the copy and allocateFrom overload can

      be done in one patch though.</p>

    <p>Jorn</p>

  </body>

</html>