OPENdj In Clustered and High Performance Environments
First things first: a plug for DAX Interactive. If you need a high
performance, high availability OPENdj environment, your best bet is to
go with the folks who invented it. If you're a real Do-It-Yourselfer
or just don't have any money, read on for some great advice, free of charge!
First we look at the various software components that comprise OPENdj,
then make some decisions about which processes
will run on what hardware and how they will all communicate.
-
OPENdj.com runs Linux 2.2, we'll soon be moving to
2.4, the threading on SMP is much better than in 2.2. Any UNIX
variant should work fine, you might have to tweak some of the shell
scripts called by various OPENdj-related services. OPENdj on Windows
is not inconceivable, but I've just never put any thought towards it,
so I have no idea what would solve your problems if you had any,
outside of telling you to try Cygwin (oh, and don't run IIS).
-
This is going to be any webserver, OPENdj.com uses
Apache but conceivably iPlanet, or (if you're crazy) IIS could work.
-
Any servlet container should do fine, but OPENdj is only tested with
Tomcat 3.2. It should run without modification on other servlet
containers such as WebLogic or JRun, but you may need to tweak your
config files a bit.
-
OPENdj.com uses the Sun Hotspot Server JVM, which is fairly quick. If
you're going to run
OPENdj in a stressful environment, it's very important that CPU
bottlenecks are minimized. OPENdj's runtime behavior is
characteristically steady low CPU usage punctuated with spikes when we
NEED things to go as fast as possible (transitioning from one
radio show to another for example). Because so much of OPENdj is
Java-based, a fast JVM is a good way to do that.
The volano chat
benchmark is about as close a benchmark
will get to what OPENdj needs to do.
-
There is some SQL in OPENdj that is PostgreSQL-dependent. For
example, the use of the SERIAL type for primary keys instead of using
sequences. It should not be too difficult to update the SQL code to
make it work with Oracle or DB2, because the SQL-related code in the
system is cordoned off into a few classes concerned with persistent
storage.
-
Icecast works best, but if there is any kind of command-line access to
your shoutcast server, then you could use that too. You would just
need to change some of the shell scripts that the
org.opendj.server.AudioServer class fires
off to do what they should for your server.
-
Same deal as the Audio Server above. Again, just change some of the
shell scripts that the AudioServer fires off to do what they should
for your firewall.
OPENdj's DJServer is very threaded. It has a thread per DJConnection, a
thread per AudioChannel, a thread per Service, they're everywhere. An
SMP machine can go a
long way towards improving performance. Consider as getting many processors
as you can afford on a single box, make that the run the DJServer.
The DJServer should be on a system that is visible to whatever network
your broadcasters are on (the Internet, corporate LAN, etc). Bear in
mind that the listeners don't need to have access to the DJServer, so
if your broadcasters are all behind your firewall, you don't have to
run the DJServer in the DMZ. Since it runs as root, this can be a
nice advantage.
Unlike DJServer, the web application can be clustered over an array of
mid-power servers. Using a loadbalancing algorithm, in combination
with RRDNS, you can have a tier of webservers (say P3-500's with 128M
RAM) backed by a separate tier of Tomcat servers (say dual 600's with
512M ram). Interconnect them with switched 100M or Gigabit Ethernet
access.
As far as the database goes, it should be on a box strong
enough to not be the bottleneck. There's very few cache
optimizations yet in the current code, so the database does get queried
often. You'll need to do your own measurements to guage your
hardware requirements here.
You can also cluster (or even geographically distribute)
the most bandwidth intensive beast, the audio server. You could use
RRDNS (or manually implement your own load balancer, actually not too
hard), coupled with icecast's ability to act as a "relay", and really
get some benefits if your icecast servers are on separate network
connections. At least two problems, though. (1) Collecting statistics on
the number of listeners is difficult. Some crafty scripting over
an ssh connection could address this. (2) The archives must now be
replicated on multiple servers. A replication policy that replicates
popular archives to distribute bandwidth more evenly while maintaining
only a single copy of infrequently streamed archives to save disk
space would be ideal. This would require some very crafty scripting
and some minor code changes but could be done.
Unfortunately, the chat server is tied to the web server because
applets (like the chat applet) are only allowed to create network
connections back to the server that served them. But you can get
around this with port forwarding at the webserver tier. The chat
server itself cannot be clustered. It is primarily thread and
network I/O intensive, so an SMP machine would do well if you
anticipate a large number of chat participants.
You could move the RerunPlayer onto its own
machine to make sure all the CPU cycles go to MP3 encoding.
The RerunScheduler should be fine to run on a low-end machine, so long
as it has a decent network connection to the database.
The AsyncMailClients can be clustered. Each can have their own
dedicated mail server with its own separate connection to the Internet
if necessary. But honestly, I don't see OPENdj as being able to
generate enough email to require that kind of scalability, unless you have
people signing up to be DJs by the tens of thousands!
The Power Monitor plugin could probably not be used, because it needs
to update JSP pages on the fly. You would have to make your webservers
expose their document trees to the internal network over NFS. It can
be dangerous to run NFS on webservers though.
If you're really building this thing to be scalable, your
data center should guarantee power availability, removing the need for
this silly product of San Francisco rolling blackouts.