Proposal: UPSTREAM.md -- better tracking of upstream code in the JDK
Magnus Ihse Bursie
magnus.ihse.bursie at oracle.com
Thu Apr 21 18:58:58 UTC 2022
The JDK project depends on many different open source projects. Some of
them are linked to as libraries at runtime, but others have their source
code directly incorporated into our source tree, known as "3rd party code".
Unfortunately, the haphazard way this code is sprinkled throughout our
code base makes it very hard to tell at a glance if some code originated
with the JDK project, or is imported from elsewhere ("upstream"). Many
times, you need to be well acquainted with these parts of the code to
know whether a file is 3rd party code or not. If you do not know, you
will need to rely on heuristics such as looking at the path name,
checking for unusual copyright headers, or looking at the git history
for commits that indicate a refresh from upstream.
I propose we do something about this situation.
My suggestion is that we add a file, UPSTREAM.md, in the top directory
of the imported 3rd party code. These files will follow a pattern, with
a set of formalized headers on the top, a blank line of separation, and
then a free-form markdown text, with e.g. relevant notes about the
project, important information about the latest update, or instructions
or hints on how to update the source to a newer version.
Here are two examples on how this might look. (Note that the free-form
text here is just some offhand examples I invented. In real life I
assume they would be more detailed.)
Example 1: src/java.xml.crypto/share/classes/com/sun/UPSTREAM.md:
===
Name: Apache Santuario
Homepage: https://santuario.apache.org/
License: src/java.xml.crypto/share/legal/santuario.md
Version: 2.2.1
Upstream-release-URL:
https://github.com/apache/santuario-xml-security-java/releases/tag/xmlsec-2.2.1
# Upgrade instructions
To upgrade the package, copy the source code from
`src/main/java/org/apache` in the upstream git repo into
`src/java.xml.crypto/share/classes/com/sun/org/apache`. Then update the
package name space by running `find
src/java.xml.crypto/share/classes/com/sun/org/apache | xargs sed -e
's/^package org\.apache/package com.sun.org.apache/'`.
===
Example 2: src/java.desktop/share/native/libharfbuzz/UPSTREAM.md:
===
Name: Harfbuzz
Homepage: https://harfbuzz.github.io/
License: src/java.desktop/share/legal/harfbuzz.md
Version: 2.8.0
Upstream-release-URL:
https://github.com/harfbuzz/harfbuzz/releases/tag/2.8.0
# How to update
To update to a new version of Harfbuzz, copy all `.cc`, `.hh` and `.h`
files from `src` into `src/java.desktop/share/native/libharfbuzz`. Check
if the build scripts in upstream has changed since the last version, and
update our makefiles accordingly.
===
These files will serve many purposes:
1) They will be a strong signal to developers coming to an unfamiliar
part of the code base that the files here originated upstream.
2) It will be possible for tooling to understand that code in these
directories might not live up to normal JDK standards. It would e.g. be
possible for the build system to automatically disable
warnings-as-errors for such code, or for upcoming tools that support
code quality efforts such as blessed modifier order or spell checks to
skip those parts of the code.
3) It will be possible to get an at-a-glance overview of what versions
of 3rd party code are included in a build of the JDK, for all included
projects -- not just as of right now, but at any point in history (since
these files gets updated when upstream code is updated in the JDK). The
build system could, for instance, collect such information and provide
it with the built JDK, just as it now collects the licenses from the
src/$MODULE/legal directories.
4) The git history for these files will clearly show when the code were
last refreshed from upstream, and by whom.
5) And finally, the free-text part gives a well-defined place to store
important information about how to upgrade, common mistakes, etc --
knowledge that right now sometimes is put down into README files, but
most often just resides in the head of the developer who last did a refresh.
Thoughts?
/Magnus
More information about the jdk-dev
mailing list