> - absolute file offsets as you mentioned vs. something else ?

Those are simplest.  BTW, 64-bit offsets are overkill, given the 16-  
and 32-bit limitations elsewhere in the format.

Slightly more compressible (and invariant under low-level encoding  
changes) would be token serial numbers.  The consumer of the index  
information is likely to have a lexer available.

-- John
