Proposal: UPSTREAM.md -- better tracking of upstream code in the JDK

Magnus Ihse Bursie magnus.ihse.bursie at oracle.com
Thu Apr 21 21:05:52 UTC 2022


On 2022-04-21 22:19, Philip Race wrote:
>
> A "marker" file indicating something is 3rd party that may be updated 
> from time to time seems fine
> but upgrading 3rd party libraries is already a pain so I'm not sure 
> how prescriptive I'd want to
> be about required content beyond simple basics.
>
> In the client area we've started to add files called UPDATING.txt 
> where we put the information
> related to tasks when updating. Whilst some library might want to put 
> that in an UPSTREAM.md
> I'd want to have the option to just have one line saying "See 
> UPDATING.txt for ..."

As I said to Kevin, I'm basically viewing UPSTREAM.md as an evolution of 
UPDATING.txt, so in effect you would update UPSTREAM.md instead of 
UPDATING.txt, but in exactly the same way (and with almost the very same 
content!). Of course, you can put "See UPDATINNG.txt for ..." in 
UPSTREAM.md, but then you'd get two files where one would do.

> I'm not sure we really need to include the current version in there.
> Then we'd perhaps be able to avoid updating this file every time.
If we want to keep track of the URL where we downloaded the actual 
source release from, we will need to update the file anyway.

If the cost of updating the file is too high, we can do with a more 
"static" file that just serves as a marker for external code. That would 
indeed solve the problems I was running into, that triggered my thinking 
about this. And it is probably possible in most cases to trace what 
version where included by finding the latest changed files in the 
directory, and looking up the corresponding issue on JBS.

But I still can't help thinking it would be good to have it stored in 
the source code repo what version we actually included. I think the cost 
of maintaining this would be low (compared to the other work required 
when upgrading, updating two lines in a text file is not really a big 
thing), and it would mean that the version information will be 
"co-located" with the source code. You can check out any commit 
whatsoever, and find out what versions of external source code where 
included.

As I said to Kevin, I think it would be a missed opportunity not to 
track versions systematically.

> BTW the true "upstream location" is more usually a site to download 
> foo-1.2.3.tar.gz .. not some  repo tag.
I agree, a curl:able link to the source tar ball is probably better.

> We even have some open source 3rd party code for which you won't find 
> a repo anywhere.
>
> And I don't think it fair to call the locations of the upstream 
> libraries "haphazard".
That was not really directed at you. :-) The client native libraries are 
very well organized, thank you very much!

> They are in the places they need to be, in many cases partly 
> determined by the build team,
> within the necessities of the modular JDK.
>
> I'm curious what "possible for the build system to automatically 
> disable warnings-as-errors for such code"
> means in practice.

I have no prototype code to show you, but it would not be too hard to 
look for such a file, and to treat all files residing in directories 
below an UPSTREAM.md file differently. For instance, disable 
warnings-as-errors. Or disabling a broader set of warnings.

For client native libraries in particular, it means that we could set a 
high bar for warnings on code we write ourselves, but add exceptions 
that disable warnings just for imported code. Even if we mix "own" code 
with imported, in the same lib. And we would be able to separate these 
files into two sets (imported and "our"), automatically.

> Note that there are some cases where JDK "glue" code is co-mingled in 
> the same directory,
> so you'd have to refactor that if this were applied universally and 
> always. 

Yeah, I know. Many client libraries have glue code like that. But most 
of them are already refactored to have imported code in a separate 
directory. I can help with refactoring the remaining.

> And perhaps we'd prefer to know about those warnings rather than just 
> have them re-accumulate ..
If we can separate this automatically, we can chose warning levels for 
"our" code and imported code separately. So we could have like 
"enable-warnings-for-imported-code", which can be on -- or off -- by 
default. Or whatever. I think we have plenty of opportunity, as long as 
there is a programmatic way to distinguish imported source code.

/Magnus

>
> -phil.
>
> On 4/21/22 11:58 AM, Magnus Ihse Bursie wrote:
>> The JDK project depends on many different open source projects. Some 
>> of them are linked to as libraries at runtime, but others have their 
>> source code directly incorporated into our source tree, known as "3rd 
>> party code".
>>
>> Unfortunately, the haphazard way this code is sprinkled throughout 
>> our code base makes it very hard to tell at a glance if some code 
>> originated with the JDK project, or is imported from elsewhere 
>> ("upstream"). Many times, you need to be well acquainted with these 
>> parts of the code to know whether a file is 3rd party code or not. If 
>> you do not know, you will need to rely on heuristics such as looking 
>> at the path name, checking for unusual copyright headers, or looking 
>> at the git history for commits that indicate a refresh from upstream.
>>
>> I propose we do something about this situation.
>>
>> My suggestion is that we add a file, UPSTREAM.md, in the top 
>> directory of the imported 3rd party code. These files will follow a 
>> pattern, with a set of formalized headers on the top, a blank line of 
>> separation, and then a free-form markdown text, with e.g. relevant 
>> notes about the project, important information about the latest 
>> update, or instructions or hints on how to update the source to a 
>> newer version.
>>
>> Here are two examples on how this might look. (Note that the 
>> free-form text here is just some offhand examples I invented. In real 
>> life I assume they would be more detailed.)
>>
>> Example 1: src/java.xml.crypto/share/classes/com/sun/UPSTREAM.md:
>> ===
>> Name: Apache Santuario
>> Homepage: https://santuario.apache.org/
>> License: src/java.xml.crypto/share/legal/santuario.md
>> Version: 2.2.1
>> Upstream-release-URL: 
>> https://github.com/apache/santuario-xml-security-java/releases/tag/xmlsec-2.2.1
>>
>> # Upgrade instructions
>>
>> To upgrade the package, copy the source code from 
>> `src/main/java/org/apache` in the upstream git repo into 
>> `src/java.xml.crypto/share/classes/com/sun/org/apache`. Then update 
>> the package name space by running `find 
>> src/java.xml.crypto/share/classes/com/sun/org/apache | xargs sed -e 
>> 's/^package org\.apache/package com.sun.org.apache/'`.
>> ===
>>
>> Example 2: src/java.desktop/share/native/libharfbuzz/UPSTREAM.md:
>> ===
>> Name: Harfbuzz
>> Homepage: https://harfbuzz.github.io/
>> License: src/java.desktop/share/legal/harfbuzz.md
>> Version: 2.8.0
>> Upstream-release-URL: 
>> https://github.com/harfbuzz/harfbuzz/releases/tag/2.8.0
>>
>> # How to update
>>
>> To update to a new version of Harfbuzz, copy all `.cc`, `.hh` and 
>> `.h` files from `src` into 
>> `src/java.desktop/share/native/libharfbuzz`. Check if the build 
>> scripts in upstream has changed since the last version, and update 
>> our makefiles accordingly.
>> ===
>>
>>
>> These files will serve many purposes:
>>
>> 1) They will be a strong signal to developers coming to an unfamiliar 
>> part of the code base that the files here originated upstream.
>>
>> 2) It will be possible for tooling to understand that code in these 
>> directories might not live up to normal JDK standards. It would e.g. 
>> be possible for the build system to automatically disable 
>> warnings-as-errors for such code, or for upcoming tools that support 
>> code quality efforts such as blessed modifier order or spell checks 
>> to skip those parts of the code.
>>
>> 3) It will be possible to get an at-a-glance overview of what 
>> versions of 3rd party code are included in a build of the JDK, for 
>> all included projects -- not just as of right now, but at any point 
>> in history (since these files gets updated when upstream code is 
>> updated in the JDK). The build system could, for instance, collect 
>> such information and provide it with the built JDK, just as it now 
>> collects the licenses from the src/$MODULE/legal directories.
>>
>> 4) The git history for these files will clearly show when the code 
>> were last refreshed from upstream, and by whom.
>>
>> 5) And finally, the free-text part gives a well-defined place to 
>> store important information about how to upgrade, common mistakes, 
>> etc -- knowledge that right now sometimes is put down into README 
>> files, but most often just resides in the head of the developer who 
>> last did a refresh.
>>
>> Thoughts?
>>
>> /Magnus
>



More information about the jdk-dev mailing list