RFR: 8319598: SMFParser misinterprets interrupted running status

Jan Trukenmüller truj at midica.org
Tue Nov 7 20:32:49 UTC 2023


Am 07.11.23 um 20:37 schrieb Jan Trukenmüller:
> The MIDI file parser misinterprets events without status byte when they appear directly after a Meta of SysEx event.
> 
> For my bugfix I had to decide between two possible solutions:
> - Strict solution: Throw an InvalidMidiDataException
> - Tolerant solution: Use the status of the last channel event as running status
> 
> I used the tolerant solution.
> I will send an email to the list client-libs-dev with my reasons.
> 
> -------------
> 
> Commit messages:
>   - 8319598: Fixed incorrect interpretation of interrupted running status in SMFParser
> 
> Changes: https://git.openjdk.org/jdk/pull/16546/files
>   Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16546&range=00
>    Issue: https://bugs.openjdk.org/browse/JDK-8319598
>    Stats: 121 lines in 2 files changed: 118 ins; 0 del; 3 mod
>    Patch: https://git.openjdk.org/jdk/pull/16546.diff
>    Fetch: git fetch https://git.openjdk.org/jdk.git pull/16546/head:pull/16546
> 
> PR: https://git.openjdk.org/jdk/pull/16546


Hi,

As I promised, here are my reasons for choosing the tolerant solution.
But first a short description what my PR is about.

Definition:
An "interrupted running status" means:
- In a MIDI file a running status is active
- The next event is a Meta or SysEx event
- The next event does not contain a status byte

Problem:
Files with "interrupted running status" are not parsed correctly.

Possible solutions:
- Strict solution: Throw an InvalidMidiDataException whenever
   an interrupted running status is found
- Tolerant soution: Use the status byte of the last channel
   event with a status byte

Here are 4 reasons for choosing the tolerant solution, sorted from weak 
to strong:


Reason 1:

The SMFParser is very tolerant against even worse violations (e.g. data 
bytes greater than 0x7F, meta data with wrong length and so on)
So I think it is in the spirit of the parser to also allow this minor 
violation.
I think that interrupted running status is only a "minor" violation 
because there is only one sentence in the specification with this 
assumption.
And because there is hardly any other possible interpretation that could 
make sense.


Reason 2:

I tried out a minimal file with interrupted running status with several 
other applications.
The following ones interpret the file like suggested by the tolerant 
solution:

- timidity
- Rosegarden
- Musescore3
- aplaymidi
- Audacious
- midicsv
- JZZ.midi.SMF.js (JZZ.js)
- VLC (with fluidsynth plugin)

The following tools behave differently:

- Pipewire's pw-mididump
   reacts with a segmentation fault
- fluidsynth -a alsa -m alsa_seq -l -i testfile.mid
   Doesn't play the file but returns a successful exit code.
   No error message.
- pmidi
   Error (Meta event 60 not implemented\nUnexpected end of file)

That's 8:3.
And I don't think that the behavior of the 3 is desirable.


Reason 3:

I downloaded a large archive with more than 169.000 MIDI files and did a 
lot of analyses and tests.
First without the bugfix and then with the bugfix.
The results can be summarized like this:

- 1.1% of all MIDI files fail to be loaded by SMFParser (for ANY reason)
- 0.43% of all MIDI files contain interrupted running status
   (maybe some more that have a different problem before the first
   IRS occurs)
- From the files with IRS
   - before the bugfix:
     - 15.67% can be parsed without exception
              (but with incorrectly interpreted data)
     - 69.9% fail with InvalidMidiDataException
     - 14.42% fail with EOFException
   - after the bugfix:
     - 94.2% can be parsed without exception
     - 5.4% fail with InvalidMidiDataException
     - 0.4% fail with EOFException
- The bugfix fixes exceptions for:
   - 78.5% of files with interrupted running status
   - 30.25% of ALL failing MIDI files in general !!!!!
- The bugfix doesn't change anything to any MIDI file without
   interrupted running status

Details:

169457 files are in the archive.
721 contain interrupted running status. (I copied them to a different 
folder to be able to analyze them separately.)

Statistics from (only) the 721 files with interrupted running status:
- before the fix:
   - all:                      721
   - success:                  113
   - InvalidMidiDataException: 504
   - EOFException:             104
- after the fix:
   - all:                      721
   - success:                  679
   - InvalidMidiDataException:  39
   - EOFException:               3

Statistics from ALL files before and after the fix:
- before the fix:
   - all:                       169457
   - success:                   167586
   - fail:                        1871
     - InvalidMidiDataException:  1606
     - EOFException:               265
- after the fix:
   - all:                       169457
   - success:                   168152
   - fail:                        1305
     - InvalidMidiDataException:  1141
     - EOFException:               164


Reason 4:

In a different branch of the JDK I added my bugfix and also a lot of 
debug information to print out several events before and after every 
interrupted running status. I tested this against the 721 files with IRS.
The result was more than 8000 occurrences of IRS, and I looked at each 
of them!
Manually!!!
The vast majority of these occurrences shows very clearly that the data 
makes absolute sense.

However a few of them (maybe 20-30) made no sense at all or were at 
least "suspicious".
So I looked at these remaining files with a hex editor.
In most cases there was a meta message with an incorrect length field or 
something similar, so that the following bytes were interpreted incorrectly.


Conclusion:

After this investigation I'm very confident that the tolerant solution 
is the best.
And isn't it pretty cool to fix 30% of all exceptions with such a simple 
fix?

Best regards
Jan Trukenmüller
(Github user: truj)


More information about the client-libs-dev mailing list