Proposal: UPSTREAM.md -- better tracking of upstream code in the JDK

Magnus Ihse Bursie magnus.ihse.bursie at oracle.com
Thu Apr 21 20:40:11 UTC 2022


On 2022-04-21 21:35, Kevin Rushforth wrote:
> I like the idea as long as we avoid duplication. Each third-party 
> library already has a required xxxx.md file in src/<module>/share/legal. 
There is some overlap between UPSTREAM.md and the files in 
$MODULE/share/legal, but it is quite thin. The legal files just tell 
*that* we have some third party code in the product, and its license. 
It's just, as the name implies, a legal requirement. It does not say 
anything where in the code base the 3rd party sources are located, or 
where the code were downloaded from. By adding more technical 
information in UPSTREAM.md, and pointing to the corresponding 
$MODULE/share/legal file, we can create a mapping between license and 
technical code information. That way the only "overlap" is that we need 
to have a "License: src/$MODULE/share/legal/$FILE.md" field in each 
UPSTREAM.md, and each such legal file will need to have a corresponding 
UPSTREAM.md file that lists it. (This correspondence can be 
automatically verified during build time.)

> Likewise, some modules have "UPDATING" instructions in the component 
> itself (which, I think, is where it belongs).

The UPDATING files is a great idea, however they are just used in a 
couple of libraries. :( To clarify, my suggestion is that UPSTREAM.md 
should in effect subsume the UPDATING files. The contents of the 
existing UPDATING files is exactly what I mean should be in the 
free-form part of UPSTREAM.md. And the placement of the UPSTREAM.md 
should be just as the current UPDATING files, in the directory where the 
third party sources are included.

So another way to express this idea is that I'm suggesting we expand on 
the concept of UPDATING by giving it a standardized name, include it in 
all imported code, and add some human-and-computer-readable fields at 
the top that define important characteristic of that code.

> If each entry in this aggregate UPSTREAM.md file were limited to the 
> name of the component (not its version), 

I think it would be a missed opportunity not to include the version number.

Sure, an UPSTREAM.md file without version number would solve purposes 1, 
2, 4 and 5 in my list, but it would leave out 3. And while I have not 
seen any P1 bugs requiring the JDK to list what versions we ship of 3rd 
party source, I think it would be useful indeed to produce such a list. 
I am sure it will help distributors, and end users, to know the answer 
to questions like "do JDK version so-and-so include code with a fix for 
bug XXX?" or similar. With this information formally specified, we can 
easily generate a document, as part of the build, like this:

===
Apache Santuario: 2.2.1
Harfbuzz: 2.8.0
....
===

which lists all our included projects, and their corresponding version 
number.

/Magnus

> the location of the md file, the location of the UPDATING instructions 
> (if any), and the location of the source code (preferably as a dir or 
> list of dirs), that seems workable.
>
> -- Kevin
>
>
> On 4/21/2022 12:13 PM, daniel.daugherty at oracle.com wrote:
>> > Thoughts?
>>
>> I like this idea. It will also benefit code archaeologists and 
>> spelunkers.
>>
>> Dan
>>
>>
>> On 4/21/22 2:58 PM, Magnus Ihse Bursie wrote:
>>> The JDK project depends on many different open source projects. Some 
>>> of them are linked to as libraries at runtime, but others have their 
>>> source code directly incorporated into our source tree, known as 
>>> "3rd party code".
>>>
>>> Unfortunately, the haphazard way this code is sprinkled throughout 
>>> our code base makes it very hard to tell at a glance if some code 
>>> originated with the JDK project, or is imported from elsewhere 
>>> ("upstream"). Many times, you need to be well acquainted with these 
>>> parts of the code to know whether a file is 3rd party code or not. 
>>> If you do not know, you will need to rely on heuristics such as 
>>> looking at the path name, checking for unusual copyright headers, or 
>>> looking at the git history for commits that indicate a refresh from 
>>> upstream.
>>>
>>> I propose we do something about this situation.
>>>
>>> My suggestion is that we add a file, UPSTREAM.md, in the top 
>>> directory of the imported 3rd party code. These files will follow a 
>>> pattern, with a set of formalized headers on the top, a blank line 
>>> of separation, and then a free-form markdown text, with e.g. 
>>> relevant notes about the project, important information about the 
>>> latest update, or instructions or hints on how to update the source 
>>> to a newer version.
>>>
>>> Here are two examples on how this might look. (Note that the 
>>> free-form text here is just some offhand examples I invented. In 
>>> real life I assume they would be more detailed.)
>>>
>>> Example 1: src/java.xml.crypto/share/classes/com/sun/UPSTREAM.md:
>>> ===
>>> Name: Apache Santuario
>>> Homepage: https://santuario.apache.org/
>>> License: src/java.xml.crypto/share/legal/santuario.md
>>> Version: 2.2.1
>>> Upstream-release-URL: 
>>> https://github.com/apache/santuario-xml-security-java/releases/tag/xmlsec-2.2.1
>>>
>>> # Upgrade instructions
>>>
>>> To upgrade the package, copy the source code from 
>>> `src/main/java/org/apache` in the upstream git repo into 
>>> `src/java.xml.crypto/share/classes/com/sun/org/apache`. Then update 
>>> the package name space by running `find 
>>> src/java.xml.crypto/share/classes/com/sun/org/apache | xargs sed -e 
>>> 's/^package org\.apache/package com.sun.org.apache/'`.
>>> ===
>>>
>>> Example 2: src/java.desktop/share/native/libharfbuzz/UPSTREAM.md:
>>> ===
>>> Name: Harfbuzz
>>> Homepage: https://harfbuzz.github.io/
>>> License: src/java.desktop/share/legal/harfbuzz.md
>>> Version: 2.8.0
>>> Upstream-release-URL: 
>>> https://github.com/harfbuzz/harfbuzz/releases/tag/2.8.0
>>>
>>> # How to update
>>>
>>> To update to a new version of Harfbuzz, copy all `.cc`, `.hh` and 
>>> `.h` files from `src` into 
>>> `src/java.desktop/share/native/libharfbuzz`. Check if the build 
>>> scripts in upstream has changed since the last version, and update 
>>> our makefiles accordingly.
>>> ===
>>>
>>>
>>> These files will serve many purposes:
>>>
>>> 1) They will be a strong signal to developers coming to an 
>>> unfamiliar part of the code base that the files here originated 
>>> upstream.
>>>
>>> 2) It will be possible for tooling to understand that code in these 
>>> directories might not live up to normal JDK standards. It would e.g. 
>>> be possible for the build system to automatically disable 
>>> warnings-as-errors for such code, or for upcoming tools that support 
>>> code quality efforts such as blessed modifier order or spell checks 
>>> to skip those parts of the code.
>>>
>>> 3) It will be possible to get an at-a-glance overview of what 
>>> versions of 3rd party code are included in a build of the JDK, for 
>>> all included projects -- not just as of right now, but at any point 
>>> in history (since these files gets updated when upstream code is 
>>> updated in the JDK). The build system could, for instance, collect 
>>> such information and provide it with the built JDK, just as it now 
>>> collects the licenses from the src/$MODULE/legal directories.
>>>
>>> 4) The git history for these files will clearly show when the code 
>>> were last refreshed from upstream, and by whom.
>>>
>>> 5) And finally, the free-text part gives a well-defined place to 
>>> store important information about how to upgrade, common mistakes, 
>>> etc -- knowledge that right now sometimes is put down into README 
>>> files, but most often just resides in the head of the developer who 
>>> last did a refresh.
>>>
>>> Thoughts?
>>>
>>> /Magnus
>>
>



More information about the jdk-dev mailing list