Proposal: UPSTREAM.md -- better tracking of upstream code in the JDK

Magnus Ihse Bursie magnus.ihse.bursie at oracle.com
Thu Apr 21 23:24:46 UTC 2022


On 2022-04-22 00:12, Kevin Rushforth wrote:
>
>
> On 4/21/2022 2:05 PM, Magnus Ihse Bursie wrote:
>> On 2022-04-21 22:19, Philip Race wrote:
>>>
>>> A "marker" file indicating something is 3rd party that may be 
>>> updated from time to time seems fine
>>> but upgrading 3rd party libraries is already a pain so I'm not sure 
>>> how prescriptive I'd want to
>>> be about required content beyond simple basics.
>>>
>>> In the client area we've started to add files called UPDATING.txt 
>>> where we put the information
>>> related to tasks when updating. Whilst some library might want to 
>>> put that in an UPSTREAM.md
>>> I'd want to have the option to just have one line saying "See 
>>> UPDATING.txt for ..."
>>
>> As I said to Kevin, I'm basically viewing UPSTREAM.md as an evolution 
>> of UPDATING.txt, so in effect you would update UPSTREAM.md instead of 
>> UPDATING.txt, but in exactly the same way (and with almost the very 
>> same content!). Of course, you can put "See UPDATINNG.txt for ..." in 
>> UPSTREAM.md, but then you'd get two files where one would do.
>
> Combining the instructions to update the various third-party libraries 
> into a single monolithic file seems like the wrong approach to me. I 
> think it make much more sense to have the instructions live in the 
> module in question with the third-party code being updated.

I think we're just talking past each other here. I am not suggesting 
that we have a *single* UPSTREAM.md file. I am suggesting that we have 
one UPSTREAM.md file per third party library, placed exactly as you say 
with the third party code.
>
>>> I'm not sure we really need to include the current version in there.
>>> Then we'd perhaps be able to avoid updating this file every time.
>> If we want to keep track of the URL where we downloaded the actual 
>> source release from, we will need to update the file anyway.
>>
>> If the cost of updating the file is too high, we can do with a more 
>> "static" file that just serves as a marker for external code. That 
>> would indeed solve the problems I was running into, that triggered my 
>> thinking about this. And it is probably possible in most cases to 
>> trace what version where included by finding the latest changed files 
>> in the directory, and looking up the corresponding issue on JBS.
>>
>> But I still can't help thinking it would be good to have it stored in 
>> the source code repo what version we actually included. I think the 
>> cost of maintaining this would be low (compared to the other work 
>> required when upgrading, updating two lines in a text file is not 
>> really a big thing), and it would mean that the version information 
>> will be "co-located" with the source code. You can check out any 
>> commit whatsoever, and find out what versions of external source code 
>> where included.
>>
>> As I said to Kevin, I think it would be a missed opportunity not to 
>> track versions systematically.
>
> But we do track the version systematically -- in the xxx.md file for 
> each third-party software. Updating that legal/xxx.md file is a 
> requirement which doesn't go away if you store it in a second 
> location. It just leads to duplication.

Well, it's kind of semi-systematically, if you ask me. Here are some 
excerpts:

java.base/share/legal/icu.md:## International Components for Unicode 
(ICU4J) v70.1
java.base/share/legal/public_suffix.md:## Mozilla Public Suffix List
java.base/share/legal/unicode.md:## The Unicode Standard, Unicode 
Character Database, Version 14.0.0

But sure, I get your point that this is already stored here. Let's drop 
that part of my proposal. (And maybe we can try to be more rigorous in 
the future on how we describe project name and version in the legal .md 
files.)

/Magnus

>
> -- Kevin
>
>>> BTW the true "upstream location" is more usually a site to download 
>>> foo-1.2.3.tar.gz .. not some  repo tag.
>> I agree, a curl:able link to the source tar ball is probably better.
>>
>>> We even have some open source 3rd party code for which you won't 
>>> find a repo anywhere.
>>>
>>> And I don't think it fair to call the locations of the upstream 
>>> libraries "haphazard".
>> That was not really directed at you. :-) The client native libraries 
>> are very well organized, thank you very much!
>>
>>> They are in the places they need to be, in many cases partly 
>>> determined by the build team,
>>> within the necessities of the modular JDK.
>>>
>>> I'm curious what "possible for the build system to automatically 
>>> disable warnings-as-errors for such code"
>>> means in practice.
>>
>> I have no prototype code to show you, but it would not be too hard to 
>> look for such a file, and to treat all files residing in directories 
>> below an UPSTREAM.md file differently. For instance, disable 
>> warnings-as-errors. Or disabling a broader set of warnings.
>>
>> For client native libraries in particular, it means that we could set 
>> a high bar for warnings on code we write ourselves, but add 
>> exceptions that disable warnings just for imported code. Even if we 
>> mix "own" code with imported, in the same lib. And we would be able 
>> to separate these files into two sets (imported and "our"), 
>> automatically.
>>
>>> Note that there are some cases where JDK "glue" code is co-mingled 
>>> in the same directory,
>>> so you'd have to refactor that if this were applied universally and 
>>> always. 
>>
>> Yeah, I know. Many client libraries have glue code like that. But 
>> most of them are already refactored to have imported code in a 
>> separate directory. I can help with refactoring the remaining.
>>
>>> And perhaps we'd prefer to know about those warnings rather than 
>>> just have them re-accumulate ..
>> If we can separate this automatically, we can chose warning levels 
>> for "our" code and imported code separately. So we could have like 
>> "enable-warnings-for-imported-code", which can be on -- or off -- by 
>> default. Or whatever. I think we have plenty of opportunity, as long 
>> as there is a programmatic way to distinguish imported source code.
>>
>> /Magnus
>>
>>>
>>> -phil.
>>>
>>> On 4/21/22 11:58 AM, Magnus Ihse Bursie wrote:
>>>> The JDK project depends on many different open source projects. 
>>>> Some of them are linked to as libraries at runtime, but others have 
>>>> their source code directly incorporated into our source tree, known 
>>>> as "3rd party code".
>>>>
>>>> Unfortunately, the haphazard way this code is sprinkled throughout 
>>>> our code base makes it very hard to tell at a glance if some code 
>>>> originated with the JDK project, or is imported from elsewhere 
>>>> ("upstream"). Many times, you need to be well acquainted with these 
>>>> parts of the code to know whether a file is 3rd party code or not. 
>>>> If you do not know, you will need to rely on heuristics such as 
>>>> looking at the path name, checking for unusual copyright headers, 
>>>> or looking at the git history for commits that indicate a refresh 
>>>> from upstream.
>>>>
>>>> I propose we do something about this situation.
>>>>
>>>> My suggestion is that we add a file, UPSTREAM.md, in the top 
>>>> directory of the imported 3rd party code. These files will follow a 
>>>> pattern, with a set of formalized headers on the top, a blank line 
>>>> of separation, and then a free-form markdown text, with e.g. 
>>>> relevant notes about the project, important information about the 
>>>> latest update, or instructions or hints on how to update the source 
>>>> to a newer version.
>>>>
>>>> Here are two examples on how this might look. (Note that the 
>>>> free-form text here is just some offhand examples I invented. In 
>>>> real life I assume they would be more detailed.)
>>>>
>>>> Example 1: src/java.xml.crypto/share/classes/com/sun/UPSTREAM.md:
>>>> ===
>>>> Name: Apache Santuario
>>>> Homepage: https://santuario.apache.org/
>>>> License: src/java.xml.crypto/share/legal/santuario.md
>>>> Version: 2.2.1
>>>> Upstream-release-URL: 
>>>> https://github.com/apache/santuario-xml-security-java/releases/tag/xmlsec-2.2.1
>>>>
>>>> # Upgrade instructions
>>>>
>>>> To upgrade the package, copy the source code from 
>>>> `src/main/java/org/apache` in the upstream git repo into 
>>>> `src/java.xml.crypto/share/classes/com/sun/org/apache`. Then update 
>>>> the package name space by running `find 
>>>> src/java.xml.crypto/share/classes/com/sun/org/apache | xargs sed -e 
>>>> 's/^package org\.apache/package com.sun.org.apache/'`.
>>>> ===
>>>>
>>>> Example 2: src/java.desktop/share/native/libharfbuzz/UPSTREAM.md:
>>>> ===
>>>> Name: Harfbuzz
>>>> Homepage: https://harfbuzz.github.io/
>>>> License: src/java.desktop/share/legal/harfbuzz.md
>>>> Version: 2.8.0
>>>> Upstream-release-URL: 
>>>> https://github.com/harfbuzz/harfbuzz/releases/tag/2.8.0
>>>>
>>>> # How to update
>>>>
>>>> To update to a new version of Harfbuzz, copy all `.cc`, `.hh` and 
>>>> `.h` files from `src` into 
>>>> `src/java.desktop/share/native/libharfbuzz`. Check if the build 
>>>> scripts in upstream has changed since the last version, and update 
>>>> our makefiles accordingly.
>>>> ===
>>>>
>>>>
>>>> These files will serve many purposes:
>>>>
>>>> 1) They will be a strong signal to developers coming to an 
>>>> unfamiliar part of the code base that the files here originated 
>>>> upstream.
>>>>
>>>> 2) It will be possible for tooling to understand that code in these 
>>>> directories might not live up to normal JDK standards. It would 
>>>> e.g. be possible for the build system to automatically disable 
>>>> warnings-as-errors for such code, or for upcoming tools that 
>>>> support code quality efforts such as blessed modifier order or 
>>>> spell checks to skip those parts of the code.
>>>>
>>>> 3) It will be possible to get an at-a-glance overview of what 
>>>> versions of 3rd party code are included in a build of the JDK, for 
>>>> all included projects -- not just as of right now, but at any point 
>>>> in history (since these files gets updated when upstream code is 
>>>> updated in the JDK). The build system could, for instance, collect 
>>>> such information and provide it with the built JDK, just as it now 
>>>> collects the licenses from the src/$MODULE/legal directories.
>>>>
>>>> 4) The git history for these files will clearly show when the code 
>>>> were last refreshed from upstream, and by whom.
>>>>
>>>> 5) And finally, the free-text part gives a well-defined place to 
>>>> store important information about how to upgrade, common mistakes, 
>>>> etc -- knowledge that right now sometimes is put down into README 
>>>> files, but most often just resides in the head of the developer who 
>>>> last did a refresh.
>>>>
>>>> Thoughts?
>>>>
>>>> /Magnus
>>>
>>
>



More information about the jdk-dev mailing list