MethodHandles.Lookup and modules

Fri Dec 18 02:01:08 UTC 2015

On Dec 17, 2015, lookup modest 12:44 PM, Alex Buckley <alex.buckley at oracle.com> wrote:
> 
> … please consider the following design:
> 
> - PUBLIC lookup mode means:
> 
>  Any 'public' type of any package exported unconditionally by the package's module.

Accessing those is the uniquely special power of PL=publicLookup, which cavalierly
ignores the READS graph, and therefore does not need  to be computed relative to anything.

For clarity I would like to give this thing a special name, UNCONDITIONAL.

UNCONDITIONAL(M1) = { T, if IS_PUBLIC(T), in PACKAGE_TYPES(P), if IS_UNCONDITIONALLY_EXPORTED(P), for P in MODULE_PACKAGES(M1) }

UNCONDITIONAL() = UNION { UNCONDITIONAL(M1), for M1 in ALL_MODULES }

> - QUALIFIED lookup mode means:
> 
>  Any 'public' type of any package exported in qualified fashion by the package's module to the lookup class's module.

QUALIFIED(M1) = { T, if IS_PUBLIC(T), for T in PACKAGE_TYPES(P), if IS_CONDITIONALLY_EXPORTED(P, M1), for P in MODULE_PACKAGES(M2) }

L_QUALIFIED(L) = QUALIFIED(TYPE_MODULE(L.LC))

N.B. This leaves the following related expression as an orphan (cannot be recomposed from the others):

- READABLE(M1) lookup mode means:

Any 'public' type of any package exported in qualified or unqualified fashion by the package's module to the lookup class's module.

READABLE(M1) = QUALIFIED(M1) + UNQUALIFIED(M1)
  where UNQUALIFIED(M1) = UNION { UNCONDITIONAL(M2), for M1 READS M2 in READS_GRAPH }

Note that UNQUALIFIED(M1) is a subset of UNCONDITIONAL() for all M1.

These classifications are nicely disjoint, so the eventual nesting
behavior arises by disjoint union.

If PUBLIC/UNCONDITIONAL is the wrong ground-level default (my point today),
then READABLE(M1) is more useful than QUALIFIED(M1).

> - MODULE lookup mode means:
> 
>  Any 'public' type of any package in the lookup class's module.

MODULE(M1) = { T, if IS_PUBLIC(T), for T in PACKAGE_TYPES(P), for P in MODULE_PACKAGES(M1)  }

MODULE(M1) is disjoint from READABLE(M1) and QUALIFIED(M1), but not from PUBLIC/UNCONDITIONAL.

For lookups we can derive things like:

L_UNCONDITIONAL(L) = UNCONDITIONAL()
L_READABLE(L) = READABLE(TYPE_MODULE(L.LC))
L_MODULE(L) = MODULE(TYPE_MODULE(L.LC))

> (Sidebar: QUALIFIED is split from MODULE primarily to be explicit about access rights and secondarily to support more precise slicing of access rights in a future MethodHandles.Lookup API. Example: give me a lookup object to access the types in this module that offer a contract, i.e. are declared 'public' without regard to exported-ness. Example: give me a lookup object to access the types outside this module which are exported to it by its friends.)

This is a useful building block, but needs to be associated with UNQUALIFIED(M1) or UNCONDITIONAL().

> - Start with an arbitrary class in an arbitrary module calling MethodHandles.Lookup.lookup() to get a "full power" lookup object L. L's lookup modes are PUBLIC + QUALIFIED + MODULE + PROTECTED + PACKAGE + PRIVATE.

Or, in terms of my previous message, it could omit PUBLIC/UNCONDITIONAL and be READABLE + MODULE + PROTECTED + PACKAGE + PRIVATE

This means that PL is not a subset of all other non-trivial lookups.
It also means that the read-graph blindness stays unique to PL.

> - The arbitrary class obtains a Class object representing class A, then calls L.in(A):
> 
>  -- If A is in a different module than L's lookup class, then the resulting lookup object has lookup mode PUBLIC.

If we started with READABLE instead of PUBLIC/UNCONDITIONAL, the resulting lookup
L.in(A) would be degenerate (call it NOACCESS mode).  This surprise would be the cost
of recognizing and enforcing the uniqueness of PL.  At the present moment, I think
this is the best way to go.

PL.in(A) would continue to be UNCONDITIONAL (as long as A is in UNCONDITIONAL()).

(There's no union or non-empty intersection of L and PL, if they do incommensurate things.)

For PL I think this simple rule suffices:
>  -- If L is PUBLIC, then L.in(A) is also PUBLIC, unless A is inaccessible to L, in which case L.in(A) has no access.

In fact, a PUBLIC/UNCONDITIONAL lookup is always the result of publicLookup()
itself or a derivative of publicLookup() via a chain of L.in(A).

BTW, there are three global rules that interact with these rules:

-- If A is not accessible to L's lookup class, using L's lookup modes, then L.in(A) has no access.

-- In all cases, the lookup modes of L.in(A) are a subset of (or equal to) the lookup modes of L.

-- In all cases (assuming no concurrent change in schemata) the set of names accessible to L.in(A) is a subset of (or equal to) the set of names accessible to L.

The third rule is a very broad requirement with all sorts of detailed implications.
All the other rules are designed to add up to the this main rule.

>  -- If A is in the same module as L's lookup class, but a different package, then the resulting lookup object has lookup mode PUBLIC + QUALIFIED + MODULE + PROTECTED.

Or, the resulting lookup object has a lookup mode no greater than
READABLE + MODULE /* + PROTECTED */.

> (#include some stuff about actually accessing protected members outside A's package.)

Actually, PROTECTED drops away first.  (The doc for Lookup.in talks about this.)
Trying to keep track of previous protected access across teleports is too hard, like
keeping track of which modules have been previously visited.  Better to just drop
access modes quickly; it keeps things simple.

> 
>  -- If A is in the same module as L's lookup class, and in the same package, but A is a different class than L's lookup class, then the resulting lookup object has lookup modes PUBLIC + QUALIFIED + MODULE + PROTECTED + PACKAGE.

… the resulting lookup object has a lookup mode no greater than
READABLE + MODULE + PACKAGE

>  -- If A is the same class as L's lookup class, then the resulting lookup object has lookup modes PUBLIC + MODULE + PROTECTED + PACKAGE + PRIVATE.

(Where did QUALIFIED go?  Assuming a typo here.)

This is a degenerate case.  Teleporting to your own class always produces an equivalent lookup (in fact, the same object).

-- If A is the same class as L's lookup class, then the resulting lookup object has the same lookup modes as L.

Here's a missing case:

-- If A is nested in the same package member as L's lookup class, then the resulting lookup object has lookup modes no greater than READABLE + MODULE + PACKAGE + PRIVATE.

(This is where PROTECTED drops away.)

> - L.in(A) succeeds (returns a lookup object) regardless of whether its caller is in a module that reads A's module.

That would be true if L contains PUBLIC/UNCONDITIONAL, but not if it only contains the other modes.
The reason is that extra check, that L must be able to access A, else L.in(A) is NOACCESS.

> Only when find* is called on a lookup object is there a check that the caller-of-find*'s module reads the module containing the lookup class embodied by the lookup object. It's easy for the caller-of-find* to pass the check by calling addReads(...) just before calling find*.

Umm, the API specifies that it performs the LC access check early (by .in),
and drops the L.in(A) to NOACCESS if A is not reachable by L.

(FTR, the access check on the referenced class REFC—first operand to
find*—happens in Lookup.checkAccess which calls VerifyAccess.isMemberAccessible
which calls VerifyAccess.isClassAccessible.  The LC may be identical
to the REFC, if the code is doing self access within the LC.
In general the classes differ.  The Lookup API requires a chain of
accessibility to both LC and REFC.)

> (Sidebar: Separately, the module containing the lookup class embodied by the lookup object had better have readability to other modules in order for find* to look up [ctors, methods, and fields of] classes in those other modules.)

Since bytecodes look up members using string descriptors,
the only readability is for the referenced class (REFC),
and not for any of the types mentioned in other parts
of the member name (field type, method arguments, etc.).

So it's not clear what readability checks would be natural
during a find*, other than on REFC.  Am I missing something?

> 
>> Should there be a way to build a lookup, for two modules M1/M2, which
>> reads those names of M2 which M1 can read, except no internals
>> of M1?  I wonder if such a thing would be useful?  Probably not.

By that I mean this small slice of types flowing along one edge M1 READS M2:

IMPORTED(M1 READS M2) = QUALIFIED_EXPORTS_VIA(M1 READS M2) + UNQUALIFIED_EXPORTS_VIA(M1 READS M2)

The combined set QUALIFIED(M1) + UNQUALIFIED(M1) is a superset of any IMPORTED(M1 READS M2).

IMO this slice is too small to be useful.

>> But it would be useful to have a lookup in a module M1 which can
>> read the exports of *every* M2 that M1 can see, except no M1 internals.
>> (This includes the unconditionally exported public names of M1.)
>> This would be a Lookup with an LC in M1 and flags of PUBLIC only.

That is READABLE(M1) above.

> The difference between these two paragraphs is hard to discern. The first paragraph seems to fix M1 and M2 while the second paragraph fixes M1 and varies M2, but there's also a switch from "M1 can read" to "M1 can see".

My bad.  In "every M2 that M1 can see", s/see/read/.

> Modules read modules, classes see classes, types access types. Can you restate?

Better now?

What do you think about segregating publicLookup instead of folding
his behavior into every other non-empty lookup?  The cost of this
is introducing some oddities with the previously-defined mode PUBLIC,
plus two more modes (MODULE, READABLE).

Given a three-way distinction between UNCONDITIONAL(),
MODULE(M1), and READABLE(M1), there are four places to
put the pre-existing access mode PUBLIC.

0. (UNCONDITIONAL, READABLE, MODULE)

1. (PUBLIC /*= UNCONDITIONAL*/, READABLE, MODULE) 

2. (UNCONDITIONAL, PUBLIC /*= READABLE*/, MODULE)

3. (UNCONDITIONAL, READABLE, PUBLIC /*= MODULE*/)

In case 0, we just retire the name Lookup.PUBLIC as being hopelessly ambiguous in the new world.

In case 1, we keep Lookup.PUBLIC but use it only for PL=publicLookup.  Other guys never get PUBLIC.

In case 2, we use Lookup.PUBLIC to refer to the most-public (weakest) access level that respects the READS graph.

In case 3, we use Lookup.PUBLIC to mean just "stuff inside my module", and reserve the new names for extra-modular relations.

Then we also have the option to refuse to distinguish UNCONDITIONAL() from
UNQUALIFIED(M1), which is what you proposed, Alex.  That would be a fifth choice:

4. (PUBLIC /*= UNCONDITIONAL*/, QUALIFIED, MODULE)

In case 4, everybody starts off as a superset of PL, which means a superset of all bytecode behavior.

Those seem to be our choices.  I think 4 is not a disaster, though it feels dirty, since it is overly powerful.

You'd think that case 1 would be the only other choice, given the name of "publicLookup",
but in fact the spec. carefully avoids saying what are the access modes of a PL.
So we could compatibly replace PUBLIC (in that place) by a new mode UNCONDITIONAL.

I think I would prefer case 2.  The user model is PUBLIC is the weakest (non-empty) access
mode available to bytecode behaviors.  As such it respects the LC's position in the module
graph, and excludes module-private, package-private, and class-private.  UNCONDITIONAL
is the special thing provided by publicLookup, which ignores the module graph.  Then
PACKAGE opens up the LC's package, MODULE opens up the LC's module, and PRIVATE
opens up the LC itself (plus its nestmates).  Feels pretty good, especially since MODULE
and PACKAGE continue to have a parallel sense of restriction.

What do you think?

— John