Vector API use for JRuby's json library
Jatin Bhateja
jatinbha.cloud at gmail.com
Fri Aug 29 18:09:15 UTC 2025
Hi Charles,
I am in the process of analyzing the performance drop with the new
/VetorizedStringEncoder /with "/citm_catalog.json/", I documented my
steps to create a working setup [1].
In order to collect profiles for custom warmup and measured iteration, I
created the following Ruby micro benchmark.
*File: encoder_benchmark.rb*
require "benchmark"
require 'json'
puts "Ruby Engine: #{RUBY_ENGINE}"
puts "JSON::Parser: #{JSON::Parser}"
benchmark_name="citm_catalog.json"
ruby_obj =
JSON.load_file("/mnt/c/Github/workloads/VectorAPI/json/benchmark/data/citm_catalog.json")
puts "== Encoding #{benchmark_name}"
hash_accum = 0
def benchmark_coder(benchmark_name, ruby_obj)
coder = JSON::Coder.new
json_str = coder.dump(ruby_obj)
hash_accum = json_str.hash
end
warmup_execution_time = Benchmark.measure do
10000.times { benchmark_coder("citm_catalog.json", ruby_obj) }
end
puts "Warmup execution time: #{warmup_execution_time.real}"
execution_time = Benchmark.measure do
20000.times { benchmark_coder("citm_catalog.json", ruby_obj) }
end
puts "Execution time: #{execution_time.real}"
My profiles and JIT code samples are placed at the following link [2]
Some initial analysis :-
- With optimization, around 27.9% of the time is spent in
StringEncoder.generate, out of which 11.6% time is spent in
VectorizedStringEncoder.encode, which
internally spends around 8% time in StringEncoder.append.
- In baseline JSON, 17.67% of the cycles are spent in
StringEncoder.generate.
The performance drop is not related to the Vector API internal
implementation, as all the APIs are getting intrinsified without any
boxing penalties, but due
to the current algorithm. As a next step, I plan to spend time
optimizing the existing implementation and also develop standalone JMH
micro benchmarks comparing
just the String encoding and VectorizedStringEncoding without the glue
logic for a true apple-to-apple comparison.
Best Regards,
Jatin
PS: As per Paul's suggestion, I am also working on optimizing
/simdjson-java/ using constant index slice [3] and using two lookup
tables [4]
[1]
https://github.com/jatin-bhateja/external_staging/blob/main/VectorizedAlgos/JRuby-json-data/jruby_vector_api_setup_steps.txt
[2]
https://github.com/jatin-bhateja/external_staging/tree/main/VectorizedAlgos/JRuby-json-data
[3] https://github.com/simdjson/simdjson-java/pull/68
[4] https://github.com/simdjson/simdjson-java/issues/69
On 7/31/2025 8:33 PM, Charles Oliver Nutter wrote:
> The developer experimenting with vectors has been running 21, so I did
> suggest to him recently to try newer releases or dev builds. I'm out
> of office right now but hoping to spend some time in the next week
> running this through a profiler to see if other missed optimizations
> are interfering with the vectorized version of the code.
>
> I also pointed out the other vector-based json project to him that was
> suggested by Daniel. I'm hopeful we can get more out of this than we
> have seen so far once I can help profile and dig into optimized
> results a little bit more.
>
> There are many other places in JRuby where we could use this, such as
> for handling text transcoding. There may even be some Ruby language
> constructs that could be vectorized by JRuby's compiler. I wish I had
> more hours in the day to experiment with this!
>
> On Mon, Jul 28, 2025, 22:40 Paul Sandoz <paul.sandoz at oracle.com> wrote:
>
> Hi Daniel,
>
> Thanks for sharing. We have made progress optimizing the
> rearrange/selectFrom operations for UTF-8 related uses cases. The
> improvements were integrated into JDK release 24 [0].
> Further optimizations are in flight for slice operations with
> constant inputs [1], which I believe can simplify the referenced
> code and may further boost performance, but we need to verify.
>
> Charlie, what version of the JDK are you using?
>
> Paul.
>
> [0] https://openjdk.org/jeps/489
> [1] https://github.com/openjdk/jdk/pull/24104
>
>> On Jul 16, 2025, at 10:46 AM, Daniel Lemire <daniel at lemire.me> wrote:
>>
>> Good day Charles,
>>
>> The following link might be relevant :
>>
>> https://github.com/simdjson/simdjson-java
>>
>> - Daniel
>>
>>> After seeing similar work done for the C version of the Ruby
>>> json standard library, I suggested to the author that we could
>>> do the same for JRuby using the Vector API. So he went and did it!
>>>
>>> https://github.com/ruby/json/pull/824
>>>
>>> The results are somewhat mixed; performance of some cases is
>>> faster and other cases is slower. We would love to get input
>>> from anyone on this list interested in seeing another real-world
>>> use case for the Vector API.
>>>
>>> I'm hopeful we can pump up these numbers with some additional
>>> tweaking in JRuby and json.
>>>
>>> *Charles Oliver Nutter*
>>> /Architect and Technologist/
>>> Headius Enterprises
>>> https://www.headius.com
>>> headius at headius.com
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20250829/1d6abc02/attachment.htm>
More information about the panama-dev
mailing list