RFR: 8247532: Records deserialization is slow

Sun Jun 21 12:49:15 UTC 2020

Hi Chris,

Good work on SerialByteStreamBuilder. A very useful tool.

What I noticed in the tests is that you usually follow the following 
pattern:

- create a record instance r

- serialize it to bytes and deserialize it back to instance deser1

- create a stream with SerialByteStreamBuilder and deserialize it to 
instance deser2

- compare deser1 with deser2

Now if there was a bug in deserialization logic it would be possible for 
both deser1 and deser2 to be equal but still wrong. Adding another 
assert to compare r with deser1 in each test could catch such bug.

In the following test:

  245     public void testArrays() throws Exception {
  246         out.println("\n---");
  247         {
  248             record IntArray(int[] ints, long[] longs) implements 
Serializable { }
  249             IntArray r = new IntArray(new int[] { 5, 4, 3,2, 1 }, 
new long[] { 9L });
  250             IntArray deser1 = deserialize(serialize(r));
  251
  252             byte[] builderBytes = SerialByteStreamBuilder
  253                     .newBuilder(IntArray.class.getName())
  254                     .addField("ints", String[].class, new int[] { 
5, 4, 3,2, 1 })
  255                     .addField("longs", String[].class, new long[] 
{ 9L })
  256                     .build();
  257
  258             IntArray deser2 = deserialize(builderBytes);
  259             assertEquals(deser2.ints(), deser1.ints());
  260             assertEquals(deser2.longs(), deser1.longs());
  261         }

...in lines 254, 255, you specify wrong types of fields (making the 
method generic with a type parameter could prevent such mistakes). I 
wonder why the test passes. In both variants (with or without the patch) 
we have this check in java.io.ObjectStreamClass.RecordSupport:

             for (int i = 0; i < fields.length; i++) {
                 ObjectStreamField f = fields[i];
                 String fName = f.getName();
                 if (!fName.equals(pName))
                     continue;

                 Class<?> fType = f.getField().getType();

                 if (!pType.isAssignableFrom(fType))

                     throw new InternalError(fName + " unassignable, pType:" + pType + ", fType:" + fType);

... which is actually checking types of reflected 
java.lang.reflact.Field(s) with types of reflected 
java.lang.reflect.RecordComponent(s) that match in name which always 
match when record type is correctly constructed. And later when values 
are deserialized and assigned to constructor parameters, we have the 
right types of values in this "unusual" stream. In other words, changing 
the field type from one reference type to another is always a compatible 
change as far as stream format is concerned, deserialization then 
succeeds if the value in the stream is of correct type for the local 
type of field. Theoretically the change can be such that the reference 
types are unrelated, because such types have one common value: null. In 
this example the stream was unusual. The types were unrelated, but the 
stream had the correct type of value != null.

Anyway, maybe the test should also contain a case where the type of 
field does change compatibly (for example, Integer field serialized, 
deserialized into Number field), and one or more where it changes 
incompatibly. For the incompatible changes, I think some exceptions 
would be emitted by MethodHandle adapters that try to adapt incompatible 
reference types, other incompatible primitive field type changes would 
be caught by java.io.ObjectStreamClass.matchFields() method...

I think that the following method in SerialByteStreamBuilder is also 
incorrect:

  363         private void writeObjectFieldDesc(DataOutputStream out) 
throws IOException {
  364             for (Map.Entry<NameAndType,Object> entry : 
objectFields.entrySet()) {
  365                 Class<?> cl = entry.getKey().type();
  366                 assert !cl.isPrimitive();
  367                 // obj_typecode
  368                 if (cl.isArray()) {
  369                     out.writeByte('[');
  370                 } else {
  371                     out.writeByte('L');
  372                 }
  373                 writeUTF(out, entry.getKey().name());
  374                 out.writeByte(TC_STRING);
  375                 writeUTF(out, "L" + cl.getName().replace('.', '/') 
+ ";");
  376
  377             }
  378         }

Line 375 should probably be:

             writeUTF(out,
                      (cl.isArray() ? cl.getName() : "L" + cl.getName() 
+ ";")
                          .replace('.', '/'));

So I took the liberty and incorporated your work into my patch (both 
will be authors of the patch) and addressed above concerns. The modified 
test does not pass on JDK14. In testIncompatibleRefFieldTypeChange JDK 
14 throws ClassCastException (because it is thrown from the binding of 
arguments to method handle) while with the patched JDK16 it throws 
InvalidObjectException, because ClassCastException is thrown from 
invokeExact, which is then wrapped with InvalidObjectException and I 
think is more correct:

http://cr.openjdk.java.net/~plevart/jdk-dev/RecordsDeserialization/webrev.04/

So this patch actually fixes a little bug too.

Regards, Peter

6/17/20 4:13 PM, Chris Hegarty wrote:

> Peter,
>
>> On 17 Jun 2020, at 09:23, Chris Hegarty <chris.hegarty at oracle.com> wrote:
>>
>>> On 17 Jun 2020, at 06:44, Peter Levart <peter.levart at gmail.com> wrote:
>>>>> ..
>>>> If there is a way for a test to compile several versions of the same (record) class and then produce byte streams with it, then we could simulate various class-evolution cases (field missing, surplus field, change of field type, etc...) without crafting the bytestream by hand. WDYT?
>>> I have a better idea. The test could contain several different classes with different names that would be used to generated variations of serialized stream. Such streams could then be "patched" so they would contain the common target class name and then used to (attempt to) deserialize the target class. I'll try to devise such test…
>>>
>> I think that could work. I’ll mock something up too, just for comparison purposes.
>
> Here is an initial version at a test that can be expanded to cover more of the stream-field shapes of serializable records.
>
> I quickly backed away from generating the byte streams using OOS alone, since it enforces strict ordering of the fields in the stream, beyond that of primitives followed by non-primitives. And I want to be able to do things like, for example, record Point(int x, int y) - stream of x:1 y:2, or y:2 x:1 - which will verify the nominality aspect of the caching implementation. So I opted for a basic byte-stream-builder approach.
>
> https://cr.openjdk.java.net/~chegar/serialFieldTest_webrev/test/jdk/java/io/Serializable/records/DifferentStreamFieldsTest.java.html
>
> I also would like a builder of serial byte streams anyway, as it will be useful for things beyond this change. Maybe it could even be expanded upon and used as a test library, at some future point. The above implementation can probably be easily broken if pushed hard with larger graphs of objects, but it should be good enough for what we need here.
>
> -Chris.
>