P. Taylor Goetz

bytes from /dev/random

Running Apache Storm on Windows

In the past, running Storm on Windows has been a challenge. While possible, it often involved hacking Storm’s source, hunting down (or building from source) native dependencies, and mucking around with various ways to trick Windows into thinking it’s like UNIX/POSIX.

That alienated a large number of potential adopters who stand to gain from integrating Storm into their big data strategy.

Thanks in large part to contributions from Storm committer David Lao, as well as contributions from Yahoo!, the next release of Storm (0.9.1-incubating) will make life much easier for users who want or need to deploy Storm in an environment where Windows is necessary.

Below I’ve listed the steps necessary to get storm up and running with a sample topology on Windows. It walks through the process of creating a single-node cluster (pseudo-cluster) and deploying a sample “Word Count” topology.

Install Java

Download and install a JDK (Storm works with both Oracle and OpenJDK 6/7). For this setup I used JDK 7 from Oracle.

I installed Java in:

C:\Java\jdk1.7.0_45\

Install Python

To test the installation, we’ll be deploying the “word count” sample from the storm-starter project which uses a multi-lang bolt written in python. I used python 2.7.6 which can be downloaded here.

I installed python in:

C:\Python27\

Install and Run Zookeeper

Download Apache Zookeeper 3.3.6 and extract it. Configure and run Zookeeper with the following commands:

> cd zookeeper-3.3.6
> copy conf\zoo_sample.cfg conf\zoo.cfg
> .\bin\zkServer.cmd

Install Storm

The changes that allow Storm to run seamlessly on Windows have not been officially released yet, but you can download a build with those changes incorporated here.

(Source branch for that build can be found here).

Extract that file to the location of your choice. I chose C:\.

Configure Environment Variables

On Windows Storm requires the STORM_HOME and JAVA_HOME environment variables to be set, as well as some additions to the PATH variable:

JAVA_HOME

C:\Java\jdk1.7.0_45\

STORM_HOME

C:\storm-0.9.1-incubating-SNAPSHOT-12182013\

PATH Add:

%STORM_HOME%\bin;%JAVA_HOME%\bin;C:\Python27;C:\Python27\Lib\site-packages\;C:\Python27\Scripts\;

PATHEXT Add:

.PY

Start Nimbus, Supervisor, and Storm UI Daemons

For each deamon, open a separate command prompt.

Nimbus

> cd %STORM_HOME%
> storm nimbus

Supervisor

> cd %STORM_HOME%
> storm supervisor

Storm UI

> cd %STORM_HOME%
> storm ui

Verify that Storm is running by opening http://localhost:8080/ in a browser.

Deploy the “Word Count” Topology

Either build the storm-starter project from source, or download a pre-built jar

Deploy the Word Count topology to your local cluster with the storm jar command:

> storm jar storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar storm.starter.WordCountTopology WordCount -c nimbus.host=localhost

If you reload the Storm UI page, you should now see the “WordCount” topology listed and can click on the link to verify that the topology is processing data.

Building Storm on OSX 10.9 (Mavericks)

Nearly a year ago to the day, my freind and coleague Brian O’Neill blogged about building storm on OSX. I had been through that pain two years ago, and largely forgot about it (once you get 0mq and JZMQ installed you’re largely in the clear). That is until today, when I had to set up a storm development environment on a new laptop…

Things have changed since then. Apple has released OSX 10.9 (Mavericks) and turned over development of the OSX JDK to Oracle, adding a little more salt to the wound.

Hopefully this will spare a few others from that pain.

JDK 6/7

On a fresh install of OSX 10.9 (no JDK installed), when you java -version from the command line you will get a prompt to download Java 7 from Oracle. But what if you need JDK 6?

Some Java applications can trigger the install of JDK 6 via software update. In my case it was Intellij IDEA. But one utility that Apple added to OSX was the /usr/libexec/java_home executable, which will output a path suitable for use as JAVA_HOME value. Without arguments, it will output the path for the default JDK:

1
2
3
4
5
6
7
$ export JAVA_HOME=$(/usr/libexec/java_home)
$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home
$ java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

You can also specify a certain version (running this should trigger an install of JDK 6 via software update – I’m not sure because Intellij had already triggered it in my situation):

1
2
3
4
5
6
7
$ export JAVA_HOME=$(/usr/libexec/java_home -v1.6)
$ echo $JAVA_HOME
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
$ java -version
java version "1.6.0_65"
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)

So now we have a way to easily switch between JDK 6 and 7.

If you need to download Java 6 manually, there is a .dmg file available from Apple.

Install Xcode and Command Line Tools

Xcode is easy to install via the App Store. You will also need the command line tools for compiling various things. You can trigger the install of the command line tools by invoking one of the stubs of the tools included, e.g.:

1
/usr/bin/strings

Install Homebrew

Homebrew is like a package manager for OSX that handles download, compilation, and installation of various tools and libraries. We will use it later to install dependencies.

1
2
ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go/install)"
brew doctor

If the brew doctor command completes without error, you’re ready to move on.

Install 0mq

The Storm documentation recommends using 0mq version 2.1.7, however I’ve found that 2.1.7 will not build on OS X 10.9 (Mavericks). The closest version I’ve found that will build under Mavericks is 2.1.9.

The new/preferred method is to use brew tap homebrew/versions, however that repository does not have 0mq 2.1.9. The fallback method is to use the brew versions command to find and checkout a specific version:

1
2
3
4
brew versions zeromq
cd /usr/local
git checkout 381c97f Library/Formula/zeromq.rb
brew install zeromq

Install JZMQ (Java bindings for 0mq)

Compiling and instlling JZMQ is where most of pain comes in, and most of the errors that come out of the build process are cryptic.

Follow the steps below exactly and you should be spared that pain:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
git clone https://github.com/nathanmarz/jzmq.git

brew install pkg-config
brew install automake
brew install libtool

export JAVA_HOME=$(/usr/libexec/java_home -v1.6)
sudo ln -s /System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers/ /Library/Java/Home/include

cd jzmq
./autogen.sh
./configure
touch src/classdist_noinst.stamp
cd src
javac -d . org/zeromq/*.java
cd ../
make
sudo make install

The pkg-config, automake, and libtool are dependencies of the JZMQ build process. After installing those, we switch to JDK 6. Next we symlink the JDK 6 header directory where the JZMQ build process will be able to find it. Without it you will get an error like:

1
cannot find jni_md.h in /Library/Java/Home/include

The touch src/classdist_noinst.stamp command prevents the following error:

1
No rule to make target `classdist_noinst.stamp'

Next, we manually compile the JZMQ java sources. Without that step, you will get the following error:

1
2
3
4
5
6
error: cannot access org.zeromq.ZMQ
class file for org.zeromq.ZMQ not found
javadoc: error - Class org.zeromq.ZMQ not found.
Error: No classes were specified on the command line.  Try -help.
make[1]: *** [org_zeromq_ZMQ.h] Error 15
make: *** [all-recursive] Error 1

Finally, we compile and install JZMQ.

Install Leiningen

Leiningen is the build tool used by storm. It is similar to Maven and Gradle, but much more suited to clojure development.

Installing Leiningen is just a matter of downloading the script and making it executable:

1
2
curl https://raw.github.com/technomancy/leiningen/stable/bin/lein -o ~/bin/lein
chmod +x ~/bin/lein

The above commands assume you have a ~/bin directory and it is added to your PATH. The easiest way to do that is in a .bash_profile file:

1
2
mkdir ~/bin
echo PATH=\$PATH:~/bin >> ~/.bash_profile

Build Storm

Now we’re finally ready to build storm:

1
2
3
git clone https://github.com/nathanmarz/storm.git
cd storm
lein sub install

This will build and install the storm jars so they can be used as dependencies in other Leiningen/Maven/Gradle/etc. projects. If you want to create a distribution ZIP archive, run the following script:

1
sh ./bin/build_release.sh

Hopefully this will save some future storm contributors some time Googling cryptic errors.

Why Sample Code Matters

Whenever you write code samples for an API or library – or even a blog post – you should treat it as though you are writing the production guidance system for the space shuttle. Okay, that may be a little extreme, but you should at least make your best effort to ensure the code is error free and follows best practices. Why? Any bugs, vulnerabilities, anti-patterns, or cruft of any kind will replicate like drunken bunnies.

Bunnies

In a blog post titled Why Windows 8 drivers are buggy Andrey Karpov discusses what he found when performing static analysis on Windows 8 driver examples published by Microsoft. In short, those examples aren’t exactly stellar. But they’re just examples and aren’t that important, right? I beg to differ.

In my opinion, the salient point of the article is best summarized in the following paragraph (emphasis mine):

Bugs in samples are not that critical as bugs in real-life software. Nevertheless, bugs can migrate from samples to real-life projects and cause developers a lot of troubles. Even within the Windows 8 Driver Samples pack we found identical bugs. The reason is obvious: copying-and-pasting a code fragment from a nearby sample. In the same way these errors will get into real-life drivers.”

In fact, I would rearrange that first sentence to read: “Bugs in samples will become critical bugs in real-life software.”

The evils of copying-and-pasting aside, the truth is that developers reading your sample code will likely (perhaps blindly) trust that a given example is the right way to do things. Like it or not, when writing examples you are acting as a teacher and can’t make any assumptions about your pupils.

To illustrate how sample code can proliferate, consider this commit to the storm-starter project. That was from two years ago, shortly after storm had been open-sourced. The impetus was to help people more familiar with java/Maven than clojure/Leiningen get started with storm. Since then, that code/comment has proliferated fairly far and wide, as can be seen from a google search for “keep storm out of the jar-with-dependencies”. In retrospect, I should have used the maven shade plugin instead, since it plays better with certain frameworks like Spring. Not a big deal, but the cat drunken bunny is out of the bag and on a love mission.

I’ll admit that I’m in no way perfect in this respect. But I strive toward that goal and believe that all serious developers should as well.

Let’s keep the drunken bunnies at bay. ;)

Storm at NYC* Big Data Tech Day

I recently had the opportunity to present at NYC* Big Data Tech Day with Brian O’Neill to speak about some of the work we’ve been doing at HMS, developing our platform for real-time Master Data Management.

Our approach leverages Storm, Cassandra, Kafka, and Elastic Search to gather data from thousands of data sources, and meld them together in near real time, to produce a single high quality health care practitioner database.

Our talk was titled “A Big Data Quadfecta: Cassandra, Storm, Elastic Search and Kafka” and should resonate with any beer enthusiasts!

Slides and video below. Unfortunately the slides are not visible in the video, but you can hopefully follow along (somewhat) with the embedded slides.

Philadelphia Cassandra User Group

Last night we kicked off the first meeting of the Philadelphia Cassandra Users Group hosted at HMS.

We had a really good turnout, with a broad range of C* users and admins. I presented on our use of storm and cassandra at HMS.

Thanks to Brady Gentile from DataStax, and my HMS colleagues Alex Collautt and Brian O’Neill for making the arrangements.

Upgrading RVM on OSX

I recently had the need to install a newer version of ruby, since I hadn’t updated in ages (I was on ruby-1.9.2-p290). For something I was working on I needed ruby-1.9.3-rc1.

Simple right? Just run

1
rvm install ruby-1.9.3-rc1

Well, not so much… Running that with an old rvm install led to nothing but cryptic hurt.

As it turns out, RVM has undergone some serious updates as well as a move to the .io TLD. I’m running OSX and using homebrew, so the first order of business was to update homebrew:

1
brew update

The next step is to install the new/latest version of RVM:

1
curl -L https://get.rvm.io | bash -s stable

That got me a bit closer, but running rvm install ruby-1.9.3-rc1 failed, but RVM was kind enough to let me know what was missing. So the next step was:

1
brew install libyaml libxml2 libxslt libksba openssl sqlite

Even closer… but still no love. The next step was to tell RVM to take care of any dependencies it needs, including installing homebrew if necessary (https://rvm.io/rvm/autolibs/)

1
rvm autolibs enable

Success! Now I was able to install ruby-1.9.3-rc1, migrate the gems I had installed under ruby-1.9.2-p290, and tell RVM to use ruby-1.9.3-rc1:

1
2
3
rvm install ruby-1.9.3-rc1
rvm migrate ruby-1.9.2-p290 ruby-1.9.3-rc1
rvm use ruby-1.9.3-rc1