Proposal: UPSTREAM.md -- better tracking of upstream code in the JDK

Magnus Ihse Bursie magnus.ihse.bursie at oracle.com
Thu Apr 21 18:58:58 UTC 2022


The JDK project depends on many different open source projects. Some of 
them are linked to as libraries at runtime, but others have their source 
code directly incorporated into our source tree, known as "3rd party code".

Unfortunately, the haphazard way this code is sprinkled throughout our 
code base makes it very hard to tell at a glance if some code originated 
with the JDK project, or is imported from elsewhere ("upstream"). Many 
times, you need to be well acquainted with these parts of the code to 
know whether a file is 3rd party code or not. If you do not know, you 
will need to rely on heuristics such as looking at the path name, 
checking for unusual copyright headers, or looking at the git history 
for commits that indicate a refresh from upstream.

I propose we do something about this situation.

My suggestion is that we add a file, UPSTREAM.md, in the top directory 
of the imported 3rd party code. These files will follow a pattern, with 
a set of formalized headers on the top, a blank line of separation, and 
then a free-form markdown text, with e.g. relevant notes about the 
project, important information about the latest update, or instructions 
or hints on how to update the source to a newer version.

Here are two examples on how this might look. (Note that the free-form 
text here is just some offhand examples I invented. In real life I 
assume they would be more detailed.)

Example 1: src/java.xml.crypto/share/classes/com/sun/UPSTREAM.md:
===
Name: Apache Santuario
Homepage: https://santuario.apache.org/
License: src/java.xml.crypto/share/legal/santuario.md
Version: 2.2.1
Upstream-release-URL: 
https://github.com/apache/santuario-xml-security-java/releases/tag/xmlsec-2.2.1

# Upgrade instructions

To upgrade the package, copy the source code from 
`src/main/java/org/apache` in the upstream git repo into 
`src/java.xml.crypto/share/classes/com/sun/org/apache`. Then update the 
package name space by running `find 
src/java.xml.crypto/share/classes/com/sun/org/apache | xargs sed -e 
's/^package org\.apache/package com.sun.org.apache/'`.
===

Example 2: src/java.desktop/share/native/libharfbuzz/UPSTREAM.md:
===
Name: Harfbuzz
Homepage: https://harfbuzz.github.io/
License: src/java.desktop/share/legal/harfbuzz.md
Version: 2.8.0
Upstream-release-URL: 
https://github.com/harfbuzz/harfbuzz/releases/tag/2.8.0

# How to update

To update to a new version of Harfbuzz, copy all `.cc`, `.hh` and `.h` 
files from `src` into `src/java.desktop/share/native/libharfbuzz`. Check 
if the build scripts in upstream has changed since the last version, and 
update our makefiles accordingly.
===


These files will serve many purposes:

1) They will be a strong signal to developers coming to an unfamiliar 
part of the code base that the files here originated upstream.

2) It will be possible for tooling to understand that code in these 
directories might not live up to normal JDK standards. It would e.g. be 
possible for the build system to automatically disable 
warnings-as-errors for such code, or for upcoming tools that support 
code quality efforts such as blessed modifier order or spell checks to 
skip those parts of the code.

3) It will be possible to get an at-a-glance overview of what versions 
of 3rd party code are included in a build of the JDK, for all included 
projects -- not just as of right now, but at any point in history (since 
these files gets updated when upstream code is updated in the JDK). The 
build system could, for instance, collect such information and provide 
it with the built JDK, just as it now collects the licenses from the 
src/$MODULE/legal directories.

4) The git history for these files will clearly show when the code were 
last refreshed from upstream, and by whom.

5) And finally, the free-text part gives a well-defined place to store 
important information about how to upgrade, common mistakes, etc -- 
knowledge that right now sometimes is put down into README files, but 
most often just resides in the head of the developer who last did a refresh.

Thoughts?

/Magnus


More information about the jdk-dev mailing list