RFR: 8343546: GHA: Cache required dependencies in master-branch workflow

Fri Jul 4 14:35:38 UTC 2025

On Fri, 4 Jul 2025 13:21:40 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> In our current GHA workflows, we only run workflows in branches in personal forks. GHA isolation rules say that workflow caches from the parent branches can be used by descendant branches. For our branches, the usual parent is `master`. Since we do not run workflows on `master`, this means every time we create a new branch, GHA would start with logically empty caches for it. Only the next trigger on the same branch would use the caches, saved from the first workflow run.
> 
> This means we put additional load on shared infrastructure with pulling JDKs, building jtreg (and pulling its dependencies), bootstrapping sysroots, etc. All these steps also fail intermittently every so often. It also means everyone carries lots of caches around, segregated by branch and repo (look into your https://github.com/your-github-name/jdk/actions/caches, for example) only relying on cache cleanups when it starts to hit 10 GB. With hundreds of contributors, this easily wastes terabytes of cloud storage space.
> 
> We can make all this more efficient and reliable, if we manage to run a master-branch workflow that bootstraps all required dependencies and caches them. These dependencies can then be used by PR branches, as "master" branch is their effective parent. 
> 
> This PR introduces the notion of "dry run", which does everything _except_ the actual builds and tests. Therefore, it verifies whether all dependencies are done properly for JDK configure to pass. This is useful in itself for future GHA debugging of dependencies. Workflow can be dispatched with additional "dry run" parameter now.
> 
> What makes master-branch caching possible is the second part of the PR that hooks up dry runs to master/stabilization branch pushes. These would make the dry-run workflow run every time you update your personal fork's master/stabilization branch. That dry run would likely finish very quickly if all caches are already in place. It would populate caches in master/stabilization branch in your personal fork, if not. 
> 
> The expected net result is that actual PRs that are branched off the personal fork master would be able to use the caches from that master workflow run. (If you want to make this experiment in current GHA, trigger the existing workflow on `master` branch in your fork, it would do roughly the same, but with all builds/tests).
> 
> A sample "dry-run" can be seen here: https://github.com/shipilev/jdk/actions/runs/16074619302. The most heavy-weight part is MSYS2 unpacking in Windows builds, and t...

That seems like a reasonable interpretation, yes. At least if you want to utilize this optimization. Otoh, Github makes it really easy nowadays with a simple "sync with upstream" button (or whatever it's called).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26134#issuecomment-3036519454