Proposal: UPSTREAM.md -- better tracking of upstream code in the JDK
Kevin Rushforth
kevin.rushforth at oracle.com
Thu Apr 21 23:40:52 UTC 2022
On 4/21/2022 4:24 PM, Magnus Ihse Bursie wrote:
> On 2022-04-22 00:12, Kevin Rushforth wrote:
>>
>>
>> On 4/21/2022 2:05 PM, Magnus Ihse Bursie wrote:
>>> On 2022-04-21 22:19, Philip Race wrote:
>>>>
>>>> A "marker" file indicating something is 3rd party that may be
>>>> updated from time to time seems fine
>>>> but upgrading 3rd party libraries is already a pain so I'm not sure
>>>> how prescriptive I'd want to
>>>> be about required content beyond simple basics.
>>>>
>>>> In the client area we've started to add files called UPDATING.txt
>>>> where we put the information
>>>> related to tasks when updating. Whilst some library might want to
>>>> put that in an UPSTREAM.md
>>>> I'd want to have the option to just have one line saying "See
>>>> UPDATING.txt for ..."
>>>
>>> As I said to Kevin, I'm basically viewing UPSTREAM.md as an
>>> evolution of UPDATING.txt, so in effect you would update UPSTREAM.md
>>> instead of UPDATING.txt, but in exactly the same way (and with
>>> almost the very same content!). Of course, you can put "See
>>> UPDATINNG.txt for ..." in UPSTREAM.md, but then you'd get two files
>>> where one would do.
>>
>> Combining the instructions to update the various third-party
>> libraries into a single monolithic file seems like the wrong approach
>> to me. I think it make much more sense to have the instructions live
>> in the module in question with the third-party code being updated.
>
> I think we're just talking past each other here. I am not suggesting
> that we have a *single* UPSTREAM.md file. I am suggesting that we have
> one UPSTREAM.md file per third party library, placed exactly as you
> say with the third party code.
Yes, I definitely thought you were talking about a single file to
aggregate them all. Sorry for the misunderstanding!
>>>> I'm not sure we really need to include the current version in there.
>>>> Then we'd perhaps be able to avoid updating this file every time.
>>> If we want to keep track of the URL where we downloaded the actual
>>> source release from, we will need to update the file anyway.
>>>
>>> If the cost of updating the file is too high, we can do with a more
>>> "static" file that just serves as a marker for external code. That
>>> would indeed solve the problems I was running into, that triggered
>>> my thinking about this. And it is probably possible in most cases to
>>> trace what version where included by finding the latest changed
>>> files in the directory, and looking up the corresponding issue on JBS.
>>>
>>> But I still can't help thinking it would be good to have it stored
>>> in the source code repo what version we actually included. I think
>>> the cost of maintaining this would be low (compared to the other
>>> work required when upgrading, updating two lines in a text file is
>>> not really a big thing), and it would mean that the version
>>> information will be "co-located" with the source code. You can check
>>> out any commit whatsoever, and find out what versions of external
>>> source code where included.
>>>
>>> As I said to Kevin, I think it would be a missed opportunity not to
>>> track versions systematically.
>>
>> But we do track the version systematically -- in the xxx.md file for
>> each third-party software. Updating that legal/xxx.md file is a
>> requirement which doesn't go away if you store it in a second
>> location. It just leads to duplication.
>
> Well, it's kind of semi-systematically, if you ask me. Here are some
> excerpts:
>
> java.base/share/legal/icu.md:## International Components for Unicode
> (ICU4J) v70.1
> java.base/share/legal/public_suffix.md:## Mozilla Public Suffix List
> java.base/share/legal/unicode.md:## The Unicode Standard, Unicode
> Character Database, Version 14.0.0
>
> But sure, I get your point that this is already stored here. Let's
> drop that part of my proposal. (And maybe we can try to be more
> rigorous in the future on how we describe project name and version in
> the legal .md files.)
OK.
-- Kevin
>
> /Magnus
>
>>
>> -- Kevin
>>
>>>> BTW the true "upstream location" is more usually a site to download
>>>> foo-1.2.3.tar.gz .. not some repo tag.
>>> I agree, a curl:able link to the source tar ball is probably better.
>>>
>>>> We even have some open source 3rd party code for which you won't
>>>> find a repo anywhere.
>>>>
>>>> And I don't think it fair to call the locations of the upstream
>>>> libraries "haphazard".
>>> That was not really directed at you. :-) The client native libraries
>>> are very well organized, thank you very much!
>>>
>>>> They are in the places they need to be, in many cases partly
>>>> determined by the build team,
>>>> within the necessities of the modular JDK.
>>>>
>>>> I'm curious what "possible for the build system to automatically
>>>> disable warnings-as-errors for such code"
>>>> means in practice.
>>>
>>> I have no prototype code to show you, but it would not be too hard
>>> to look for such a file, and to treat all files residing in
>>> directories below an UPSTREAM.md file differently. For instance,
>>> disable warnings-as-errors. Or disabling a broader set of warnings.
>>>
>>> For client native libraries in particular, it means that we could
>>> set a high bar for warnings on code we write ourselves, but add
>>> exceptions that disable warnings just for imported code. Even if we
>>> mix "own" code with imported, in the same lib. And we would be able
>>> to separate these files into two sets (imported and "our"),
>>> automatically.
>>>
>>>> Note that there are some cases where JDK "glue" code is co-mingled
>>>> in the same directory,
>>>> so you'd have to refactor that if this were applied universally and
>>>> always.
>>>
>>> Yeah, I know. Many client libraries have glue code like that. But
>>> most of them are already refactored to have imported code in a
>>> separate directory. I can help with refactoring the remaining.
>>>
>>>> And perhaps we'd prefer to know about those warnings rather than
>>>> just have them re-accumulate ..
>>> If we can separate this automatically, we can chose warning levels
>>> for "our" code and imported code separately. So we could have like
>>> "enable-warnings-for-imported-code", which can be on -- or off -- by
>>> default. Or whatever. I think we have plenty of opportunity, as long
>>> as there is a programmatic way to distinguish imported source code.
>>>
>>> /Magnus
>>>
>>>>
>>>> -phil.
>>>>
>>>> On 4/21/22 11:58 AM, Magnus Ihse Bursie wrote:
>>>>> The JDK project depends on many different open source projects.
>>>>> Some of them are linked to as libraries at runtime, but others
>>>>> have their source code directly incorporated into our source tree,
>>>>> known as "3rd party code".
>>>>>
>>>>> Unfortunately, the haphazard way this code is sprinkled throughout
>>>>> our code base makes it very hard to tell at a glance if some code
>>>>> originated with the JDK project, or is imported from elsewhere
>>>>> ("upstream"). Many times, you need to be well acquainted with
>>>>> these parts of the code to know whether a file is 3rd party code
>>>>> or not. If you do not know, you will need to rely on heuristics
>>>>> such as looking at the path name, checking for unusual copyright
>>>>> headers, or looking at the git history for commits that indicate a
>>>>> refresh from upstream.
>>>>>
>>>>> I propose we do something about this situation.
>>>>>
>>>>> My suggestion is that we add a file, UPSTREAM.md, in the top
>>>>> directory of the imported 3rd party code. These files will follow
>>>>> a pattern, with a set of formalized headers on the top, a blank
>>>>> line of separation, and then a free-form markdown text, with e.g.
>>>>> relevant notes about the project, important information about the
>>>>> latest update, or instructions or hints on how to update the
>>>>> source to a newer version.
>>>>>
>>>>> Here are two examples on how this might look. (Note that the
>>>>> free-form text here is just some offhand examples I invented. In
>>>>> real life I assume they would be more detailed.)
>>>>>
>>>>> Example 1: src/java.xml.crypto/share/classes/com/sun/UPSTREAM.md:
>>>>> ===
>>>>> Name: Apache Santuario
>>>>> Homepage: https://santuario.apache.org/
>>>>> License: src/java.xml.crypto/share/legal/santuario.md
>>>>> Version: 2.2.1
>>>>> Upstream-release-URL:
>>>>> https://github.com/apache/santuario-xml-security-java/releases/tag/xmlsec-2.2.1
>>>>>
>>>>> # Upgrade instructions
>>>>>
>>>>> To upgrade the package, copy the source code from
>>>>> `src/main/java/org/apache` in the upstream git repo into
>>>>> `src/java.xml.crypto/share/classes/com/sun/org/apache`. Then
>>>>> update the package name space by running `find
>>>>> src/java.xml.crypto/share/classes/com/sun/org/apache | xargs sed
>>>>> -e 's/^package org\.apache/package com.sun.org.apache/'`.
>>>>> ===
>>>>>
>>>>> Example 2: src/java.desktop/share/native/libharfbuzz/UPSTREAM.md:
>>>>> ===
>>>>> Name: Harfbuzz
>>>>> Homepage: https://harfbuzz.github.io/
>>>>> License: src/java.desktop/share/legal/harfbuzz.md
>>>>> Version: 2.8.0
>>>>> Upstream-release-URL:
>>>>> https://github.com/harfbuzz/harfbuzz/releases/tag/2.8.0
>>>>>
>>>>> # How to update
>>>>>
>>>>> To update to a new version of Harfbuzz, copy all `.cc`, `.hh` and
>>>>> `.h` files from `src` into
>>>>> `src/java.desktop/share/native/libharfbuzz`. Check if the build
>>>>> scripts in upstream has changed since the last version, and update
>>>>> our makefiles accordingly.
>>>>> ===
>>>>>
>>>>>
>>>>> These files will serve many purposes:
>>>>>
>>>>> 1) They will be a strong signal to developers coming to an
>>>>> unfamiliar part of the code base that the files here originated
>>>>> upstream.
>>>>>
>>>>> 2) It will be possible for tooling to understand that code in
>>>>> these directories might not live up to normal JDK standards. It
>>>>> would e.g. be possible for the build system to automatically
>>>>> disable warnings-as-errors for such code, or for upcoming tools
>>>>> that support code quality efforts such as blessed modifier order
>>>>> or spell checks to skip those parts of the code.
>>>>>
>>>>> 3) It will be possible to get an at-a-glance overview of what
>>>>> versions of 3rd party code are included in a build of the JDK, for
>>>>> all included projects -- not just as of right now, but at any
>>>>> point in history (since these files gets updated when upstream
>>>>> code is updated in the JDK). The build system could, for instance,
>>>>> collect such information and provide it with the built JDK, just
>>>>> as it now collects the licenses from the src/$MODULE/legal
>>>>> directories.
>>>>>
>>>>> 4) The git history for these files will clearly show when the code
>>>>> were last refreshed from upstream, and by whom.
>>>>>
>>>>> 5) And finally, the free-text part gives a well-defined place to
>>>>> store important information about how to upgrade, common mistakes,
>>>>> etc -- knowledge that right now sometimes is put down into README
>>>>> files, but most often just resides in the head of the developer
>>>>> who last did a refresh.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> /Magnus
>>>>
>>>
>>
>
More information about the jdk-dev
mailing list