Project Panama is moving to GitHub

Thu Jan 30 11:36:06 UTC 2020

There's also another hidden reason as to why splitting could be a 
tolerable move; in the past, having branches in the same repo also meant 
that, say, if you wanted to create yet another branch which contained 
vector + foreign, that was doable with our current infrastructure for 
handling dependent branches. This has been used in a number of cases 
(although, sadly, the Panama foreign+vector has been broken for more 
than I wish to remember) - in amber the experiment were a bit more 
successful.

Now, it would appear that moving things in different repositories would 
add obstacles to creating new "super" branches which contains feature A 
+ B + C. In reality, the infrastructure that Skara will deliver is well 
capable of handling things like that - as the new automerge feature will 
likely allow for expressing dependencies even _outside_ the repository 
in which the code lives (e.g. branch A in repo R1 can depend on branch B 
in repo R2).

So, with this, I think the decision as to whether hosting many branches 
on a single repo vs. multiple repos is, mostly, a cosmetic one. I think 
it makes sense to group logically related branches in the same repo 
(e.g. foreign-memaccess, foreign-abi, foreign-jextract - as the name of 
the branch suggests) - but I think it makes also sense to keep things 
that are logically distinct (such as vector and FFI support) in 
different repos, so that different audiences can visit different GitHub 
repos, with different READMEs, etc.

Of course there will be a point in time where - e.g. we'd want to add 
support for vector types in the foreign branch - but it seems like the 
machinery we have (thanks to Skara) is powerful enough to, say, create a 
foreign-vector branch which depends on both foreign-abi and 
vector/vectorIntrinsics.

Maurizio

On 30/01/2020 03:13, John Rose wrote:
> On Jan 27, 2020, at 3:57 AM, Jorn Vernee <jorn.vernee at oracle.com 
> <mailto:jorn.vernee at oracle.com>> wrote:
>>
>> On 25/01/2020 04:24, John Rose wrote:
>>> On Jan 24, 2020, at 3:10 AM, Maurizio Cimadamore 
>>> <maurizio.cimadamore at oracle.com 
>>> <mailto:maurizio.cimadamore at oracle.com>> wrote:
>>>>
>>>> "A Project must be sponsored by one or more Groups. A Project may 
>>>> have web content, one or more file repositories, and one or more 
>>>> mailing lists."
>>>>
>>>> So I think, process-wise, we probably have cover.
>>>
>>> Yes, we do have permission to make more repos in
>>> any given project, although we don’t make use of
>>> that ability much, except at the inception of a project.
>>>
>>> I also think that the git-based infra will scale well
>>> to a multi-repo use case.  In the OpenJDK we have
>>> lots of fork-like copies of the JDK, each with a
>>> cluster of localized changes.  I assume the GitHub
>>> back end is able to recognize that there are many
>>> objects in common between the various repos.
>>>
>>> (Not getting into an hg-vs.-git evaluation here. It’s
>>> likely that hg also has some story for partial sharing
>>> across local repos.  But AFAIK OpenJDK infra. doesn’t
>>> use such a thing.)
>>>
>>> So we can reconsider our choice to use branches, since
>>> on GitHub we can spin up forks about as easily as branches.
>>>
>>> I think our use of branches has worn well on the hg infra,
>>> and would continue to wear well on GitHub.  So I don’t
>>> see much pressure to move to separate repos, other than:
>>>
>>> 1. Separate repos allow separate mirroring decisions for
>>> Vector and Foreign.
>>> 2. We could make the Vector and Foreign work less visible
>>> to people that don’t want to see one project or the other.
>>
>> GitHub, the website, is oriented at having 1 main branch per repo. 
>> This is the 'master' branch by default, but can also be changed to 
>> another branch. This is also the branch which README is shown on the 
>> repo's web page, basically the front-page of the repository. With 2 
>> repos, we can have 2 front pages.
>>
>> Furthermore, having 2 repos allows us to have the Pull Requests for 
>> the 2 parts of the project separated. Which would at least save some 
>> distraction when working on either part of the  project.
>>
>> These are not stellar advantages of using separate repos over 
>> separate branches. Though, I currently don't see any downsides to 
>> having multiple repos over multiple branches either. So why not pick 
>> the slightly better of the two options? The main advantage here is 
>> being able to better shape how things look on GitHub. If you wanted, 
>> you could still combine the separate repos on GitHub into a single 
>> local repo using 2 remotes (git is stronger here than  hg).
>>
>
> OK, these are additional reasons to prefer separate repos.
> Seems like that’s a better fit for GitHub.  Thanks!
>
>>> FTR, neither of those strike me as very strong reasons
>>> to use multiple repos.  Googling around on the theme
>>> of “git fork vs. branch” suggests that forking is most
>>> useful at a trust boundary, or divergence of artifacts,
>>> but the whole OpenJDK is really just one trust domain,
>>> one team, and one main result (the Java RI).
>>
>> "Fork" is not really the right term to use here. A fork on GitHub is 
>> really just a copy of a repository on GitHub, created when clicking 
>> the 'fork' button in the top-right [1] (/a//nybody/ can do this), 
>> that can be edited by the person who owns the fork, usually with the 
>> intent of making a pull request from the fork to the main repository 
>> later.
>>
> Yes, I suppose I should have said “copy” instead of “fork”.
> My point is independent of which workflow we are talking
> about:  I observe that “fork” is cheap (by all appearances),
> and therefore other kinds of copies (such as keeping similar
> copies of the JDK) are also probably cheap.  I suppose, though
> I can’t easily google up evidence, that the backend of GitHub
> cleverly amortizes the storage cost of a Git object across all
> repos that use that object, even when those repos are not
> part of the same organization (as in a fork).  Thus it is likely
> to amortize them in our case, which is simpler.
>>
>> For instance, to contribute to Panama on GitHub, each of us would 
>> have to make their own fork of the Panama repo, push changes to the 
>> fork (we wouldn't have direct write-access to the main repo. There's 
>> the trust boundary), and then make a pull request based on a diff 
>> between a branch in our fork and a branch in the main repository.
>>
> Yes; the workflow assumes a cheap copy operation, probably
> the copy of a pointer rather than of a bunch of metadata.
> To me it’s evidence that GitHub scales well to hosting many
> similar repos.  Our current doesn’t do this, hence it tilts
> towards the use of branches, which (in Hg) can amortize
> the cost of the common data across multiple branches, but
> only in a single repo.
>
> — John