<Sound Dev> New thoughts on Java Sound

Sat Oct 24 12:18:54 UTC 2015

Hi Florian

I agree that getting sound out of a clip file isn't too problematic, but the portability problems arise when you want to do something more ambitious, and on reflection I think many of these issues arise over the vague definition of what should be a mixer - or maybe, what should be the default mixer. For example:

1. A clip should be short, but how short?  Or to put it another way, how long is the maximum length of a clip?  10 seconds, a minute, 5 minutes?  
2. How many clips can I open at once? 8? 16? 32? Is there a pool for clip data, if so does have a maximum length?
3. What file formats should my clips be in? What sample rate? What sample size? 

Each time something is left undefined, then some implementer will make his or her own decisions about what is reasonable or practical. Let's move on:

4. I can open a SourceDataLine for dynamic audio output, but what sample rates are accepted? We can probably rely on 44100 and 48000, but what about 96000 or even 192000?  And how about lower rates? Can I reasonably expect a Mixer to play at any sample rate between (say) 2000 and 48000?  
5. A similar set of questions for SourceDataLine sample sizes ...?
6. Can I open two simultaneous SourceDataLines and write to them both simultaneously? Can they both have different sample rates and sizes?  What about the relative volume of the two lines?

Of course, it may well be that the default mixers all agree on these points, but if the spec is vague and undefined then there is always potential for a programmer to write and test a program on (say) Windows and discover that the default mixer on a Mac is different. This might happen down the line, in 4 years time.

There are some parts that are completely undefined:
7.  If I open a SourceDataLine with two channels then I'm pretty certain that I should buffer the data for the left channel first, followed by the right.  But what about 5.1 data?  What order does that get written in?  Is it fixed between platforms?  Or even between different sound cards?

And there are parts that are so wacky that only the cognoscenti can get to grips with them:
8. Controls! I probably don't need to say any more, but I will.  For any entity there may be a set of controls. I can get hold of any array of these controls by making some call or another, and then if I can figure out what each control actually is, I can adjust it in my code.  Whoever designed this obviously thought that KISS was meant for other people.

Ok - I know that there answers for each of the questions I pose in 1-7, but that's not the point. The point is that the answers aren't specified anywhere.  This is where the possibility of non-portability arises.

There is also the problem of the hacking coder. Someone who doesn't fully understand what's going on but has been told he or she has got three days to get this sound working on this app!  With sufficient trial and error, it's always possible to bodge something together that works in the here-and-now - but heaven help what happens in the future!  

What I would like to see is a few well defined audio models that we can rely on. Off the top of my head, these might be:
1. A stereo model, capable of supporting 16 clips and 8 simultaneous SourceDataLines, at default sample size of 24 bits, and switchable between 44100 and 48000 sample rates. 

2. A surround model geared towards 5.1 support. This has one 6 channel (5+1) SourceDataLine, and a further 7 SourceDataLines that can be mapped into the 3D space around the listener. This would mean that a space craft could pass through an asteroid field, and the user would hear each asteroid as it goes past (if there were actually any sound in space!). There would also be 16 clips that can be mapped into 3D space.  (Here's the killer bit!) Opening the surround sound model on a two speaker system would invoke a HRTF mapper that will map the 3D sound into two channels, probably for use using earphones. Opening the model on a computer with a different arrangement of multichannel speakers invokes whatever remapping is necessary to give the same audio effect to the user. 

3. I also see we might have a high definition model that will work at 96000 Hz or higher, with bigger sample sizes.

4. For non-standard equipment, we retain an updated and improved SPI, in much the same way as we do today.

Just my two cents

Bob
--
On 24 Oct 2015, at 11:18, Florian Bomers <javasound-dev at bome.com> wrote:

> Hi Bob,
> 
> I do agree on a higher level, but not for this particular case.
> 
> Improving an established cross platform API (like Java Sound) is never
> easy because it must stay compatible for the users. With "users", I
> mean programmers using Java Sound, and their users. Here, on this
> list, we discuss the underlying implementation. Often enough it's
> painful and complicated to still fulfill the spec of 15 years ago. But
> the upside is that these efforts guarantee that using Java Sound
> remains the same and even 15 year old JS programs still work the same.
> 
> In this particular case (the NPE thing), I don't think anything is
> broken: the "old" implementation works and is "according to
> specification". I just want to ensure that it does not open a loophole
> to become broken!
> 
> On a side note, it is very important to protect users from poorly
> programmed, or broken, or malicious plugins (i.e. SPI's). That's why
> we catch the NPE and don't pass it on to the unsuspecting user.
> 
>> Personally, I've always thought the concept of a mixer is
>> flawed, because it's inherently a non-portable concept, (...)
> 
> Hmmm, I cannot follow here. For me, only the name "Mixer" is
> non-portable. If you think of it as "AudioDevice" or "AudioCard" than
> Java Sound is pretty much the same as any other audio hardware
> abstraction. The same is true for the naming of SourceDataLine
> ("OutputStream") and TargetDataLine ("InputStream").
> 
>> When I write a Java Sound program, I want the same level
>> of simplicity.
> 
> It seems to me that you imply that you need to implement an SPI in
> order to play back a sound? That would be horrible indeed!
> 
> But, as I'm sure you know, if you need simplicity, just get a Clip,
> load a file into it, and play: 3 simple lines of code. If you want to
> do, for example, low latency VoIP or a software synthesizer, things
> become a bit more complicated, but that's the same for any audio API
> I've worked with.
> 
> For me, it's important that the simple things are /simple/ to do, and
> the advanced things are /possible/ to do.
> 
>> Is it time to freeze and deprecate the existing Java Sound, and
>> start again with a new design, with cross platform portability as
>> its major aim?
> 
> The idea is tempting: a modern audio API in Java. But frankly, when
> thinking about it, it would most likely boil down to little more than
> JS class renaming/consolidating/cleaning. If you look at other audio
> API's, you'll always find the concepts of device, stream, audio
> format, file, codec -- just as in Java Sound. The main difference is
> that JS uses strange naming and overly complicated arrangements...
> 
> What I don't see at all is the missing platform portability in Java Sound?
> 
> Thanks,
> Florian
> 
> 
> On 24.10.2015 02:12, Bob Lang wrote:
>> On 23 Oct 2015, at 19:56, Florian Bomers <javasound-dev at bome.com>
>> wrote:
>> 
>>> Hi Sergey,
>>> 
>>> I guess you're right and the second loop will never be executed
>>> if we will always have the default mixer providers.
>>> 
>>> Removing the NPE catch clause, however, will still cause a
>>> backwards incompatibility, because if a poorly programmed
>>> MixerProvider gets installed which throws NPE for whatever reason
>>> (might also happen when "info" is non-null), now
>>> AudioSystem.getMixer() will throw NPE, where it previously
>>> worked.
>>> 
>>> I agree that it's harder for debugging mixer providers if NPE is 
>>> ignored. Other than that, I don't see any problem with keeping
>>> the NPE catch for backwards compatibility's sake. Even if just
>>> theoretical... But you never now, companies might be using poorly
>>> programmed in-house software or the like.
>>> 
>>> Thanks, Florian
>> 
>> So, it's broken if you do the right thing and it's broken if you
>> don't??
>> 
>> There comes a point in any software project where years of
>> cumulative amendments, fixes and modifications make the code so
>> fragile that it's no longer modifiable.
>> 
>> Personally, I've always thought the concept of a mixer is flawed,
>> because it's inherently a non-portable concept, which should be
>> anathema in a language that has portability as its main goal.  The
>> Mixer SPI only works when the programmer writing the code actually
>> understands what's going on and the implications of any choice -
>> and frankly, how often does that happen?  When I write a graphics
>> program, I don't have to worry about the specific capabilities of
>> the specific video card on the user's specific computer - all that
>> detail is (quite rightly) hidden from me.  And because it's hidden
>> from me, my program works on any desktop/laptop computer. When I
>> write a Java Sound program, I want the same level of simplicity.
>> It should be possible, because sound is inherently simpler than
>> video - yet Java Sound makes it far more complex.
>> 
>> Is it time to freeze and deprecate the existing Java Sound, and
>> start again with a new design, with cross platform portability as
>> its major aim?
>> 
>> Bob --
>> 
> 
> -- 
> Florian Bomers
> Bome Software
> 
> everything sounds.
> http://www.bome.com
> __________________________________________________________________
> Bome Software GmbH & Co KG        Gesellschafterin:
> Dachauer Str.187                  Bome Komplementär GmbH
> 80637 München, Germany            Geschäftsführung: Florian Bömers
> Amtsgericht München HRA95502      Amtsgericht München HRB185574
>