RFR: 8319598: SMFParser misinterprets interrupted running status
Jan Trukenmüller
truj at midica.org
Tue Nov 7 20:32:49 UTC 2023
Am 07.11.23 um 20:37 schrieb Jan Trukenmüller:
> The MIDI file parser misinterprets events without status byte when they appear directly after a Meta of SysEx event.
>
> For my bugfix I had to decide between two possible solutions:
> - Strict solution: Throw an InvalidMidiDataException
> - Tolerant solution: Use the status of the last channel event as running status
>
> I used the tolerant solution.
> I will send an email to the list client-libs-dev with my reasons.
>
> -------------
>
> Commit messages:
> - 8319598: Fixed incorrect interpretation of interrupted running status in SMFParser
>
> Changes: https://git.openjdk.org/jdk/pull/16546/files
> Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16546&range=00
> Issue: https://bugs.openjdk.org/browse/JDK-8319598
> Stats: 121 lines in 2 files changed: 118 ins; 0 del; 3 mod
> Patch: https://git.openjdk.org/jdk/pull/16546.diff
> Fetch: git fetch https://git.openjdk.org/jdk.git pull/16546/head:pull/16546
>
> PR: https://git.openjdk.org/jdk/pull/16546
Hi,
As I promised, here are my reasons for choosing the tolerant solution.
But first a short description what my PR is about.
Definition:
An "interrupted running status" means:
- In a MIDI file a running status is active
- The next event is a Meta or SysEx event
- The next event does not contain a status byte
Problem:
Files with "interrupted running status" are not parsed correctly.
Possible solutions:
- Strict solution: Throw an InvalidMidiDataException whenever
an interrupted running status is found
- Tolerant soution: Use the status byte of the last channel
event with a status byte
Here are 4 reasons for choosing the tolerant solution, sorted from weak
to strong:
Reason 1:
The SMFParser is very tolerant against even worse violations (e.g. data
bytes greater than 0x7F, meta data with wrong length and so on)
So I think it is in the spirit of the parser to also allow this minor
violation.
I think that interrupted running status is only a "minor" violation
because there is only one sentence in the specification with this
assumption.
And because there is hardly any other possible interpretation that could
make sense.
Reason 2:
I tried out a minimal file with interrupted running status with several
other applications.
The following ones interpret the file like suggested by the tolerant
solution:
- timidity
- Rosegarden
- Musescore3
- aplaymidi
- Audacious
- midicsv
- JZZ.midi.SMF.js (JZZ.js)
- VLC (with fluidsynth plugin)
The following tools behave differently:
- Pipewire's pw-mididump
reacts with a segmentation fault
- fluidsynth -a alsa -m alsa_seq -l -i testfile.mid
Doesn't play the file but returns a successful exit code.
No error message.
- pmidi
Error (Meta event 60 not implemented\nUnexpected end of file)
That's 8:3.
And I don't think that the behavior of the 3 is desirable.
Reason 3:
I downloaded a large archive with more than 169.000 MIDI files and did a
lot of analyses and tests.
First without the bugfix and then with the bugfix.
The results can be summarized like this:
- 1.1% of all MIDI files fail to be loaded by SMFParser (for ANY reason)
- 0.43% of all MIDI files contain interrupted running status
(maybe some more that have a different problem before the first
IRS occurs)
- From the files with IRS
- before the bugfix:
- 15.67% can be parsed without exception
(but with incorrectly interpreted data)
- 69.9% fail with InvalidMidiDataException
- 14.42% fail with EOFException
- after the bugfix:
- 94.2% can be parsed without exception
- 5.4% fail with InvalidMidiDataException
- 0.4% fail with EOFException
- The bugfix fixes exceptions for:
- 78.5% of files with interrupted running status
- 30.25% of ALL failing MIDI files in general !!!!!
- The bugfix doesn't change anything to any MIDI file without
interrupted running status
Details:
169457 files are in the archive.
721 contain interrupted running status. (I copied them to a different
folder to be able to analyze them separately.)
Statistics from (only) the 721 files with interrupted running status:
- before the fix:
- all: 721
- success: 113
- InvalidMidiDataException: 504
- EOFException: 104
- after the fix:
- all: 721
- success: 679
- InvalidMidiDataException: 39
- EOFException: 3
Statistics from ALL files before and after the fix:
- before the fix:
- all: 169457
- success: 167586
- fail: 1871
- InvalidMidiDataException: 1606
- EOFException: 265
- after the fix:
- all: 169457
- success: 168152
- fail: 1305
- InvalidMidiDataException: 1141
- EOFException: 164
Reason 4:
In a different branch of the JDK I added my bugfix and also a lot of
debug information to print out several events before and after every
interrupted running status. I tested this against the 721 files with IRS.
The result was more than 8000 occurrences of IRS, and I looked at each
of them!
Manually!!!
The vast majority of these occurrences shows very clearly that the data
makes absolute sense.
However a few of them (maybe 20-30) made no sense at all or were at
least "suspicious".
So I looked at these remaining files with a hex editor.
In most cases there was a meta message with an incorrect length field or
something similar, so that the following bytes were interpreted incorrectly.
Conclusion:
After this investigation I'm very confident that the tolerant solution
is the best.
And isn't it pretty cool to fix 30% of all exceptions with such a simple
fix?
Best regards
Jan Trukenmüller
(Github user: truj)
More information about the client-libs-dev
mailing list