<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi Charles,</p>
I am in the process of analyzing the performance drop with the new <i>VetorizedStringEncoder
</i>with "<i>citm_catalog.json</i>", I documented my steps to
create a working setup [1].<br>
In order to collect profiles for custom warmup and measured
iteration, I created the following Ruby micro benchmark.
<p><b>File: encoder_benchmark.rb</b></p>
require "benchmark"<br>
require 'json'<br>
<br>
puts "Ruby Engine: #{RUBY_ENGINE}"<br>
puts "JSON::Parser: #{JSON::Parser}"<br>
<br>
benchmark_name="citm_catalog.json"<br>
ruby_obj =
JSON.load_file("/mnt/c/Github/workloads/VectorAPI/json/benchmark/data/citm_catalog.json")<br>
puts "== Encoding #{benchmark_name}"<br>
<br>
hash_accum = 0<br>
def benchmark_coder(benchmark_name, ruby_obj)<br>
coder = JSON::Coder.new<br>
json_str = coder.dump(ruby_obj)<br>
hash_accum = json_str.hash<br>
end<br>
<br>
warmup_execution_time = Benchmark.measure do<br>
10000.times { benchmark_coder("citm_catalog.json", ruby_obj) }<br>
end<br>
puts "Warmup execution time: #{warmup_execution_time.real}"<br>
<br>
execution_time = Benchmark.measure do<br>
20000.times { benchmark_coder("citm_catalog.json", ruby_obj) }<br>
end<br>
puts "Execution time: #{execution_time.real}"<br>
<br>
My profiles and JIT code samples are placed at the following link
[2]<br>
Some initial analysis :-<br>
- With optimization, around 27.9% of the time is spent in
StringEncoder.generate, out of which 11.6% time is spent in
VectorizedStringEncoder.encode, which<br>
internally spends around 8% time in StringEncoder.append.<br>
<p> - In baseline JSON, 17.67% of the cycles are spent in
StringEncoder.generate. </p>
The performance drop is not related to the Vector API internal
implementation, as all the APIs are getting intrinsified without any
boxing penalties, but due<br>
to the current algorithm. As a next step, I plan to spend time
optimizing the existing implementation and also develop standalone
JMH micro benchmarks comparing<br>
just the String encoding and VectorizedStringEncoding without the
glue logic for a true apple-to-apple comparison.<br>
<br>
Best Regards,<br>
Jatin
<p>PS: As per Paul's suggestion, I am also working on optimizing <i>simdjson-java</i>
using constant index slice [3] and using two lookup tables [4]</p>
[1]
<a class="moz-txt-link-freetext"
href="https://github.com/jatin-bhateja/external_staging/blob/main/VectorizedAlgos/JRuby-json-data/jruby_vector_api_setup_steps.txt">https://github.com/jatin-bhateja/external_staging/blob/main/VectorizedAlgos/JRuby-json-data/jruby_vector_api_setup_steps.txt</a><br>
[2]
<a class="moz-txt-link-freetext"
href="https://github.com/jatin-bhateja/external_staging/tree/main/VectorizedAlgos/JRuby-json-data">https://github.com/jatin-bhateja/external_staging/tree/main/VectorizedAlgos/JRuby-json-data</a>
<div class="moz-cite-prefix">[3] <a class="moz-txt-link-freetext"
href="https://github.com/simdjson/simdjson-java/pull/68">https://github.com/simdjson/simdjson-java/pull/68</a></div>
<div class="moz-cite-prefix">[4] <a class="moz-txt-link-freetext"
href="https://github.com/simdjson/simdjson-java/issues/69">https://github.com/simdjson/simdjson-java/issues/69</a></div>
<p><br>
</p>
<div class="moz-cite-prefix">On 7/31/2025 8:33 PM, Charles Oliver
Nutter wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAE-f1xT=AuBJroY=E4JhgunXhu1Z-ixSCbvwmUmcq1GE_CfrEg@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="auto">The developer experimenting with vectors has been
running 21, so I did suggest to him recently to try newer
releases or dev builds. I'm out of office right now but hoping
to spend some time in the next week running this through a
profiler to see if other missed optimizations are interfering
with the vectorized version of the code.
<div dir="auto"><br>
</div>
<div dir="auto">I also pointed out the other vector-based json
project to him that was suggested by Daniel. I'm hopeful we
can get more out of this than we have seen so far once I can
help profile and dig into optimized results a little bit more.</div>
<div dir="auto"><br>
</div>
<div dir="auto">There are many other places in JRuby where we
could use this, such as for handling text transcoding. There
may even be some Ruby language constructs that could be
vectorized by JRuby's compiler. I wish I had more hours in the
day to experiment with this!</div>
</div>
<br>
<div class="gmail_quote gmail_quote_container">
<div dir="ltr" class="gmail_attr">On Mon, Jul 28, 2025, 22:40
Paul Sandoz <<a href="mailto:paul.sandoz@oracle.com"
moz-do-not-send="true" class="moz-txt-link-freetext">paul.sandoz@oracle.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="line-break:after-white-space">
Hi Daniel,
<div><br>
</div>
<div>Thanks for sharing. We have made progress optimizing
the rearrange/selectFrom operations for UTF-8 related uses
cases. The improvements were integrated into JDK release
24 [0].</div>
<div>
<div>Further optimizations are in flight for slice
operations with constant inputs [1], which I believe can
simplify the referenced code and may further boost
performance, but we need to verify.</div>
</div>
<div><br>
</div>
<div>Charlie, what version of the JDK are you using?</div>
<div><br>
</div>
<div>Paul.</div>
<div><br>
</div>
<div>[0] <a href="https://openjdk.org/jeps/489"
target="_blank" rel="noreferrer" moz-do-not-send="true"
class="moz-txt-link-freetext">https://openjdk.org/jeps/489</a><br>
<div>[1] <a
href="https://github.com/openjdk/jdk/pull/24104"
target="_blank" rel="noreferrer"
moz-do-not-send="true" class="moz-txt-link-freetext">https://github.com/openjdk/jdk/pull/24104</a></div>
<div><br>
<blockquote type="cite">
<div>On Jul 16, 2025, at 10:46 AM, Daniel Lemire <<a
href="mailto:daniel@lemire.me" target="_blank"
rel="noreferrer" moz-do-not-send="true"
class="moz-txt-link-freetext">daniel@lemire.me</a>>
wrote:</div>
<br>
<div>
<div>
<div>Good day Charles,</div>
<div><br>
</div>
<div>The following link might be relevant :</div>
<div><br>
</div>
<div><a
href="https://github.com/simdjson/simdjson-java" target="_blank"
rel="noreferrer" moz-do-not-send="true"
class="moz-txt-link-freetext">https://github.com/simdjson/simdjson-java</a><br>
</div>
<div><br>
</div>
<div>- Daniel</div>
<div><br>
</div>
<blockquote type="cite"
id="m_5949170455508475308qt">
<div dir="ltr">
<div>After seeing similar work done for the C
version of the Ruby json standard library, I
suggested to the author that we could do the
same for JRuby using the Vector API. So he
went and did it!</div>
<div><br>
</div>
<div><a
href="https://github.com/ruby/json/pull/824" target="_blank"
rel="noreferrer" moz-do-not-send="true"
class="moz-txt-link-freetext">https://github.com/ruby/json/pull/824</a></div>
<div><br>
</div>
<div>The results are somewhat mixed;
performance of some cases is faster and
other cases is slower. We would love to get
input from anyone on this list interested in
seeing another real-world use case for the
Vector API.</div>
<div><br>
</div>
<div>I'm hopeful we can pump up these numbers
with some additional tweaking in JRuby and
json.</div>
<div><br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">
<div><b>Charles Oliver Nutter</b></div>
<div><i>Architect and Technologist</i></div>
<div>Headius Enterprises</div>
<div><a href="https://www.headius.com/"
target="_blank" rel="noreferrer"
moz-do-not-send="true"
class="moz-txt-link-freetext">https://www.headius.com</a></div>
<div>
<div><a
href="mailto:headius@headius.com"
target="_blank" rel="noreferrer"
moz-do-not-send="true"
class="moz-txt-link-freetext">headius@headius.com</a></div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>