Data server design questions
Tom Lasseter
t.lasseter at comcast.net
Tue Sep 4 13:04:25 PDT 2012
I am developing a high-performance cloud data server primarily for the
display of large engineering and scientific datasets. The client-side is a
rich Java application though a client browser is possible if the GUI were
limited.
The challenge is the diverse libraries and technologies that are available.
I have spent weeks analyzing and prototyping. The decision on what path to
take is a complex one requiring knowledge of not just the detailed software
architecture of the various design possibilities, but
projections/predictions of how effectively they will function in a
heavy-load environment.
Despite the fact that most questions on this open-jdk list are low-level, I
believe this is the right venue for my questions as the experts who
understand the code well enough to design and improve it are the only ones
who can expertly answer the questions I'm posing.
A data server is pretty general, so I will be more explicit about the
requirements for this one.
The data consists of ASCII files which are JSON-like key-value files each of
which describes an object. Associated with each of these object files are a
series of binary files which contain the bulk of the data. These data
objects fall into two extreme groups: 1) large 3D/4D volumes from which the
user wishes to extract a subset of the data; 2) smaller objects of which the
user may wish to completely display many thousands.
For the large volume files, the user is often moving a cursor through the
volume, so the server will need to be constantly delivering new segments
selected. It makes sense to have a server thread monitoring requests for
this volume which may be coming simultaneously from multiple users and
starting worker threads to deliver the requests. Client-server sockets
should stay alive for some period of time and the associated server work
thread might be kept alive as well until the user ceases activity.
For the large sets of smaller object files, the options are to make
individual requests for each file and have the client process and display
the data as received, or to make a request for all the objects and have the
server send them sequentially on a single socket. The optimization here is
a bit tricky: the IO can be running on several sockets or sequentially on a
single socket, and the processing on both server and client ends could be
run on single or multiple threads in either case.
QUESTION 1: Looking at these requirements and scenarios, how would you
design this client-server system in general terms?
An interesting analysis and presentation was put together by Paul Tyma:
"Thousands of Threads and Blocking I/O: The old way to write Java Servers is
New again (and way better)"
http://www.mailinator.com/tymaPaulMultithreaded.pdf
where he makes the case that asynchronous IO was developed because of
performance issues in creating and switching threads, but that hardware
improvements have since significantly reduced these problems. His
comparison shows that the multi-thread blocking IO is simpler and more
efficient than asynchronous IO. QUESTION 2: What are your comments on
Paul's analysis and has it changed since his analysis and presentation
(2008)?
Looking at the technologies out there is quite confusing. Here are the main
ones I've studied in detail.
1) Java 7 with its outstanding NIO.2 API; Anghel Leonard's book
describing many of its features is amazing: there are dozens of examples
which can be rapidly loaded and tested and they all work (!); the problem is
that no one seems to be adopting the Java 7 socket APIs; many such as the
Netty group say it's somewhat incompatible and will be adopted at a low
level under the existing Netty API; Jetty is not supporting asynchronous IO
at all (I don't believe); QUESTION 3: What are the issues with Java 7
socket APIs that are inhibiting their adoption?
2) Apache MINA; a lightweight server with an FTP server implemented;
uses blocking IO;
3) Netty asynchronous NIO; excellent design for an event-driven
client-server system; does not have a TCP file-server implementation
comparable to FTP;
4) Kaazing socket gateway server; closed source, not asynchronous;
5) Waarp FTP and R66 file servers; heavy-duty file server built on
Netty by the French government; impressive and complex.
QUESTION 4: Are there other project and technologies I should be aware of?
It seems to me that that the best solution is to use Java 7 and use simple
blocking IO for the application I've described. The only issue is
reliability under heavy-load on a cloud server system.
QUESTION 5: Can I rely on the cloud server itself (such as Amazon EC2) to
provide me all the bells-and-whistles needed such as authentication,
load-balancing, etc and do so reliably with the simple server I'm
envisioning? QUESTION 6: Do any of the other technologies discussed above
provide capabilities which cannot be easily built from scratch?
Thank you very much for any and all input!
Tom Lasseter
Email: tom at gc-rt.com
Website: gc-rt.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/nio-dev/attachments/20120904/33ad4a5d/attachment.html
More information about the nio-dev
mailing list