Project Panama is moving to GitHub

Thu Jan 30 03:13:14 UTC 2020

On Jan 27, 2020, at 3:57 AM, Jorn Vernee <jorn.vernee at oracle.com> wrote:
> 
> On 25/01/2020 04:24, John Rose wrote:
>> On Jan 24, 2020, at 3:10 AM, Maurizio Cimadamore <maurizio.cimadamore at oracle.com <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>> 
>>> "A Project must be sponsored by one or more Groups. A Project may have web content, one or more file repositories, and one or more mailing lists."
>>> 
>>> So I think, process-wise, we probably have cover.
>> 
>> Yes, we do have permission to make more repos in
>> any given project, although we don’t make use of
>> that ability much, except at the inception of a project.
>> 
>> I also think that the git-based infra will scale well
>> to a multi-repo use case.  In the OpenJDK we have
>> lots of fork-like copies of the JDK, each with a
>> cluster of localized changes.  I assume the GitHub
>> back end is able to recognize that there are many
>> objects in common between the various repos.
>> 
>> (Not getting into an hg-vs.-git evaluation here. It’s
>> likely that hg also has some story for partial sharing
>> across local repos.  But AFAIK OpenJDK infra. doesn’t
>> use such a thing.)
>> 
>> So we can reconsider our choice to use branches, since
>> on GitHub we can spin up forks about as easily as branches.
>> 
>> I think our use of branches has worn well on the hg infra,
>> and would continue to wear well on GitHub.  So I don’t
>> see much pressure to move to separate repos, other than:
>> 
>> 1. Separate repos allow separate mirroring decisions for
>> Vector and Foreign.
>> 2. We could make the Vector and Foreign work less visible
>> to people that don’t want to see one project or the other.
> GitHub, the website, is oriented at having 1 main branch per repo. This is the 'master' branch by default, but can also be changed to another branch. This is also the branch which README is shown on the repo's web page, basically the front-page of the repository. With 2 repos, we can have 2 front pages.
> 
> Furthermore, having 2 repos allows us to have the Pull Requests for the 2 parts of the project separated. Which would at least save some distraction when working on either part of the  project.
> 
> These are not stellar advantages of using separate repos over separate branches. Though, I currently don't see any downsides to having multiple repos over multiple branches either. So why not pick the slightly better of the two options? The main advantage here is being able to better shape how things look on GitHub. If you wanted, you could still combine the separate repos on GitHub into a single local repo using 2 remotes (git is stronger here than  hg).
> 

OK, these are additional reasons to prefer separate repos.
Seems like that’s a better fit for GitHub.  Thanks!

>> FTR, neither of those strike me as very strong reasons
>> to use multiple repos.  Googling around on the theme
>> of “git fork vs. branch” suggests that forking is most
>> useful at a trust boundary, or divergence of artifacts,
>> but the whole OpenJDK is really just one trust domain,
>> one team, and one main result (the Java RI).
> "Fork" is not really the right term to use here. A fork on GitHub is really just a copy of a repository on GitHub, created when clicking the 'fork' button in the top-right [1] (anybody can do this), that can be edited by the person who owns the fork, usually with the intent of making a pull request from the fork to the main repository later.
> 
Yes, I suppose I should have said “copy” instead of “fork”.
My point is independent of which workflow we are talking
about:  I observe that “fork” is cheap (by all appearances),
and therefore other kinds of copies (such as keeping similar
copies of the JDK) are also probably cheap.  I suppose, though
I can’t easily google up evidence, that the backend of GitHub
cleverly amortizes the storage cost of a Git object across all
repos that use that object, even when those repos are not
part of the same organization (as in a fork).  Thus it is likely
to amortize them in our case, which is simpler.
> For instance, to contribute to Panama on GitHub, each of us would have to make their own fork of the Panama repo, push changes to the fork (we wouldn't have direct write-access to the main repo. There's the trust boundary), and then make a pull request based on a diff between a branch in our fork and a branch in the main repository.
> 

Yes; the workflow assumes a cheap copy operation, probably
the copy of a pointer rather than of a bunch of metadata.
To me it’s evidence that GitHub scales well to hosting many
similar repos.  Our current doesn’t do this, hence it tilts
towards the use of branches, which (in Hg) can amortize
the cost of the common data across multiple branches, but
only in a single repo.

— John