Hadoop – update

Well, it’s been awhile since I’ve given an update on the Hadoop cluster…

I built 1, 2 and 4 node clusters.
I ran some basic files through them and here’s the results:

In reality, I’m VERY pleased with the results of the WordCount algorithm (that was provided).  Going from 1 to 2 to 4 Mini’s isn’t linear (how could it be?), but it’s surprisingly close, so I’m thrilled.

I was having some cluster stability issues, so I decided to try Hadoop 2.x and Ubuntu Server 12.04.1 LTS
Hadoop 2.x is TOTALLY different, and their walkthrough is … sparse.
So, back to 1.0.4, but at least I know how it works.  I’m sticking with Ubuntu Server, as it’s cleaner, faster and easier to configure.

On little trick:  the mini’s won’t boot (or reboot) without a monitor.  Solution?  Here:
http://gallery.nancyblenkhorn.com/main.php/v/Headless+Mac+Mini/
Off to Radio Shack 😉

At this point, I’ve built a new master, and a new slave.  Richard is going to image the slave, then we’ll try to make another one.  If that’s successful, we’ll image 6 more and build an 8-node cluster.  POWER!

Once I can get the cluster stable (running the same file over and over, running multiple tests with consistent success) — then we’ll image more and make larger clusters.

I can’t decide if I want to make an 8-node cluster AND a 16 AND a 32 …
Or if I want to try to add new Mini’s to an existing cluster.  Either way I’ll have to document it well and provide that information once it’s running.
I’m also interested in testing the performance of the cluster on a local switch, and distributed across the network to see what kind of latency a real network provides in this environment.

A little at a time and I’ll have a large, stable cluster.  Then off to algorithmics.  There are some bright MIS students here at WOU, and I’m sure one of them is familiar enough with Python (and parallel programming) to help me turn this pile of mini’s into something quite powerful. 

The ultimate goal, of course, is to actually crunch some data — not simply to learn about Hadoop.  Once I’ve read enough and played enough, I’m confident that I can process some Big Data from WOU and learn things we didn’t know before 🙂

Leave a Reply