Fwd: DALEQ: An Open-Source Tool for Assessing Java Binary Equivalence

Magnus Ihse Bursie magnus.ihse.bursie at oracle.com
Thu Aug 14 12:22:17 UTC 2025


While not directly applicable to the JDK, this is still interesting from 
a general reproducible build perspective.

/Magnus

-------- Forwarded Message --------
Subject: 	DALEQ: An Open-Source Tool for Assessing Java Binary Equivalence
Date: 	Fri, 8 Aug 2025 03:42:47 +0000
From: 	Jens Dietrich via rb-general 
<rb-general at lists.reproducible-builds.org>
Reply-To: 	General discussions about reproducible builds 
<rb-general at lists.reproducible-builds.org>
To: 	rb-general at lists.reproducible-builds.org 
<rb-general at lists.reproducible-builds.org>
CC: 	Jens Dietrich <jens.dietrich at vuw.ac.nz>



Introducing DALEQ: An Open-Source Tool for Assessing Java Binary Equivalence

We’re excited to announce the release of DALEQ — a new open-source tool 
for analyzing and comparing Java binaries. DALEQ is designed to help 
developers, security researchers, and build engineers assess whether two 
.jar files built from the same source code are semantically equivalent, 
even when they’re not bitwise identical. This is particularly useful for 
comparing  jars from Maven Central and jars produced via reproducible 
builds, or  generated by services like Oracle’s build-from-source or 
Google’s Assured OSS. Although tools like diff or hash-based checks can 
detect binary differences, they don’t explain why binaries differ, or 
whether those differences matter. Bytecode-level differences can be 
caused by changes in compilers or build pipelines — not necessarily by 
compromised builds. DALEQ helps distinguish harmless variation from 
meaningful divergence.

How DALEQ Works

DALEQ focuses on Java bytecode comparison, though it can also analyze 
resources and metadata in jars. At its core, DALEQ uses a datalog engine 
(Soufflé) — the same kind of logic-based analysis engine used in systems 
like CodeQL — to normalize and compare bytecode structures. Key features 
include:

- Bytecode normalization to reduce irrelevant build differences
- Semantic diffing that identifies and explains non-equivalent instructions
- Provenance tracking: For equivalent files, DALEQ shows how equivalence 
was derived via datalog rules, for non-equivalent files, it provides 
bytecode-level diffs

DALEQ also verifies whether the underlying source code inputs are the 
same (or at least equivalent, tolerating some variations in comments and 
formatting) and includes integrations with existing tools like the 
standard javap disassembler. It supports extensibility through a plugin 
system.

Real-World Evaluation

DALEQ builds on our earlier research into levels of binary equivalence. 
We evaluated the tool using real-world .jar files from Oracle and 
Google, both of whom independently rebuild Java packages from source. 
The results are encouraging: DALEQ was able to classify 85–90% of .class 
files that were not bitwise identical as still being semantically 
equivalent, with supporting provenance.

Learn More

You can try out DALEQ now on GitHub: https://github.com/binaryeq/daleq/
A detailed technical paper describing DALEQ and our evaluation: 
https://arxiv.org/abs/2508.01530
A technical paper describing the conceptual approach of levels of binary 
equivalence: https://arxiv.org/abs/2410.08427 (to be presented at 
ICSME’25 <https://conf.researchr.org/home/icsme-2025>)


Jens Dietrich (Associate Professor at Victoria University of Wellington)

Behnaz Hassanshahi (Principal Researcher and Tech Lead at Oracle, Oracle 
Labs Brisbane)

  *






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/build-dev/attachments/20250814/d210ba00/attachment-0001.htm>


More information about the build-dev mailing list