Proposal: UPSTREAM.md -- better tracking of upstream code in the JDK
Kevin Rushforth
kevin.rushforth at oracle.com
Thu Apr 21 22:12:16 UTC 2022
On 4/21/2022 2:05 PM, Magnus Ihse Bursie wrote:
> On 2022-04-21 22:19, Philip Race wrote:
>>
>> A "marker" file indicating something is 3rd party that may be updated
>> from time to time seems fine
>> but upgrading 3rd party libraries is already a pain so I'm not sure
>> how prescriptive I'd want to
>> be about required content beyond simple basics.
>>
>> In the client area we've started to add files called UPDATING.txt
>> where we put the information
>> related to tasks when updating. Whilst some library might want to put
>> that in an UPSTREAM.md
>> I'd want to have the option to just have one line saying "See
>> UPDATING.txt for ..."
>
> As I said to Kevin, I'm basically viewing UPSTREAM.md as an evolution
> of UPDATING.txt, so in effect you would update UPSTREAM.md instead of
> UPDATING.txt, but in exactly the same way (and with almost the very
> same content!). Of course, you can put "See UPDATINNG.txt for ..." in
> UPSTREAM.md, but then you'd get two files where one would do.
Combining the instructions to update the various third-party libraries
into a single monolithic file seems like the wrong approach to me. I
think it make much more sense to have the instructions live in the
module in question with the third-party code being updated.
>> I'm not sure we really need to include the current version in there.
>> Then we'd perhaps be able to avoid updating this file every time.
> If we want to keep track of the URL where we downloaded the actual
> source release from, we will need to update the file anyway.
>
> If the cost of updating the file is too high, we can do with a more
> "static" file that just serves as a marker for external code. That
> would indeed solve the problems I was running into, that triggered my
> thinking about this. And it is probably possible in most cases to
> trace what version where included by finding the latest changed files
> in the directory, and looking up the corresponding issue on JBS.
>
> But I still can't help thinking it would be good to have it stored in
> the source code repo what version we actually included. I think the
> cost of maintaining this would be low (compared to the other work
> required when upgrading, updating two lines in a text file is not
> really a big thing), and it would mean that the version information
> will be "co-located" with the source code. You can check out any
> commit whatsoever, and find out what versions of external source code
> where included.
>
> As I said to Kevin, I think it would be a missed opportunity not to
> track versions systematically.
But we do track the version systematically -- in the xxx.md file for
each third-party software. Updating that legal/xxx.md file is a
requirement which doesn't go away if you store it in a second location.
It just leads to duplication.
-- Kevin
>> BTW the true "upstream location" is more usually a site to download
>> foo-1.2.3.tar.gz .. not some repo tag.
> I agree, a curl:able link to the source tar ball is probably better.
>
>> We even have some open source 3rd party code for which you won't find
>> a repo anywhere.
>>
>> And I don't think it fair to call the locations of the upstream
>> libraries "haphazard".
> That was not really directed at you. :-) The client native libraries
> are very well organized, thank you very much!
>
>> They are in the places they need to be, in many cases partly
>> determined by the build team,
>> within the necessities of the modular JDK.
>>
>> I'm curious what "possible for the build system to automatically
>> disable warnings-as-errors for such code"
>> means in practice.
>
> I have no prototype code to show you, but it would not be too hard to
> look for such a file, and to treat all files residing in directories
> below an UPSTREAM.md file differently. For instance, disable
> warnings-as-errors. Or disabling a broader set of warnings.
>
> For client native libraries in particular, it means that we could set
> a high bar for warnings on code we write ourselves, but add exceptions
> that disable warnings just for imported code. Even if we mix "own"
> code with imported, in the same lib. And we would be able to separate
> these files into two sets (imported and "our"), automatically.
>
>> Note that there are some cases where JDK "glue" code is co-mingled in
>> the same directory,
>> so you'd have to refactor that if this were applied universally and
>> always.
>
> Yeah, I know. Many client libraries have glue code like that. But most
> of them are already refactored to have imported code in a separate
> directory. I can help with refactoring the remaining.
>
>> And perhaps we'd prefer to know about those warnings rather than just
>> have them re-accumulate ..
> If we can separate this automatically, we can chose warning levels for
> "our" code and imported code separately. So we could have like
> "enable-warnings-for-imported-code", which can be on -- or off -- by
> default. Or whatever. I think we have plenty of opportunity, as long
> as there is a programmatic way to distinguish imported source code.
>
> /Magnus
>
>>
>> -phil.
>>
>> On 4/21/22 11:58 AM, Magnus Ihse Bursie wrote:
>>> The JDK project depends on many different open source projects. Some
>>> of them are linked to as libraries at runtime, but others have their
>>> source code directly incorporated into our source tree, known as
>>> "3rd party code".
>>>
>>> Unfortunately, the haphazard way this code is sprinkled throughout
>>> our code base makes it very hard to tell at a glance if some code
>>> originated with the JDK project, or is imported from elsewhere
>>> ("upstream"). Many times, you need to be well acquainted with these
>>> parts of the code to know whether a file is 3rd party code or not.
>>> If you do not know, you will need to rely on heuristics such as
>>> looking at the path name, checking for unusual copyright headers, or
>>> looking at the git history for commits that indicate a refresh from
>>> upstream.
>>>
>>> I propose we do something about this situation.
>>>
>>> My suggestion is that we add a file, UPSTREAM.md, in the top
>>> directory of the imported 3rd party code. These files will follow a
>>> pattern, with a set of formalized headers on the top, a blank line
>>> of separation, and then a free-form markdown text, with e.g.
>>> relevant notes about the project, important information about the
>>> latest update, or instructions or hints on how to update the source
>>> to a newer version.
>>>
>>> Here are two examples on how this might look. (Note that the
>>> free-form text here is just some offhand examples I invented. In
>>> real life I assume they would be more detailed.)
>>>
>>> Example 1: src/java.xml.crypto/share/classes/com/sun/UPSTREAM.md:
>>> ===
>>> Name: Apache Santuario
>>> Homepage: https://santuario.apache.org/
>>> License: src/java.xml.crypto/share/legal/santuario.md
>>> Version: 2.2.1
>>> Upstream-release-URL:
>>> https://github.com/apache/santuario-xml-security-java/releases/tag/xmlsec-2.2.1
>>>
>>> # Upgrade instructions
>>>
>>> To upgrade the package, copy the source code from
>>> `src/main/java/org/apache` in the upstream git repo into
>>> `src/java.xml.crypto/share/classes/com/sun/org/apache`. Then update
>>> the package name space by running `find
>>> src/java.xml.crypto/share/classes/com/sun/org/apache | xargs sed -e
>>> 's/^package org\.apache/package com.sun.org.apache/'`.
>>> ===
>>>
>>> Example 2: src/java.desktop/share/native/libharfbuzz/UPSTREAM.md:
>>> ===
>>> Name: Harfbuzz
>>> Homepage: https://harfbuzz.github.io/
>>> License: src/java.desktop/share/legal/harfbuzz.md
>>> Version: 2.8.0
>>> Upstream-release-URL:
>>> https://github.com/harfbuzz/harfbuzz/releases/tag/2.8.0
>>>
>>> # How to update
>>>
>>> To update to a new version of Harfbuzz, copy all `.cc`, `.hh` and
>>> `.h` files from `src` into
>>> `src/java.desktop/share/native/libharfbuzz`. Check if the build
>>> scripts in upstream has changed since the last version, and update
>>> our makefiles accordingly.
>>> ===
>>>
>>>
>>> These files will serve many purposes:
>>>
>>> 1) They will be a strong signal to developers coming to an
>>> unfamiliar part of the code base that the files here originated
>>> upstream.
>>>
>>> 2) It will be possible for tooling to understand that code in these
>>> directories might not live up to normal JDK standards. It would e.g.
>>> be possible for the build system to automatically disable
>>> warnings-as-errors for such code, or for upcoming tools that support
>>> code quality efforts such as blessed modifier order or spell checks
>>> to skip those parts of the code.
>>>
>>> 3) It will be possible to get an at-a-glance overview of what
>>> versions of 3rd party code are included in a build of the JDK, for
>>> all included projects -- not just as of right now, but at any point
>>> in history (since these files gets updated when upstream code is
>>> updated in the JDK). The build system could, for instance, collect
>>> such information and provide it with the built JDK, just as it now
>>> collects the licenses from the src/$MODULE/legal directories.
>>>
>>> 4) The git history for these files will clearly show when the code
>>> were last refreshed from upstream, and by whom.
>>>
>>> 5) And finally, the free-text part gives a well-defined place to
>>> store important information about how to upgrade, common mistakes,
>>> etc -- knowledge that right now sometimes is put down into README
>>> files, but most often just resides in the head of the developer who
>>> last did a refresh.
>>>
>>> Thoughts?
>>>
>>> /Magnus
>>
>
More information about the jdk-dev
mailing list