In the past, running Storm on Windows has been a challenge. While possible, it often involved hacking Storm’s source, hunting down (or building from source) native dependencies, and mucking around with various ways to trick Windows into thinking it’s like UNIX/POSIX.
That alienated a large number of potential adopters who stand to gain from integrating Storm into their big data strategy.
Thanks in large part to contributions from Storm committer David Lao, as well as contributions from Yahoo!, the next release of Storm (0.9.1-incubating) will make life much easier for users who want or need to deploy Storm in an environment where Windows is necessary.
Below I’ve listed the steps necessary to get storm up and running with a sample topology on Windows. It walks through the process of creating a single-node cluster (pseudo-cluster) and deploying a sample “Word Count” topology.
Download and install a JDK (Storm works with both Oracle and OpenJDK 6/7). For this setup I used JDK 7 from Oracle.
I installed Java in:
To test the installation, we’ll be deploying the “word count” sample from the storm-starter project which uses a multi-lang bolt written in python. I used python 2.7.6 which can be downloaded here.
Nearly a year ago to the day, my freind and coleague Brian O’Neill blogged about building storm on OSX. I had been through that pain two years ago, and largely forgot about it (once you get 0mq and JZMQ installed you’re largely in the clear). That is until today, when I had to set up a storm development environment on a new laptop…
Things have changed since then. Apple has released OSX 10.9 (Mavericks) and turned over development of the OSX JDK to Oracle, adding a little more salt to the wound.
Hopefully this will spare a few others from that pain.
On a fresh install of OSX 10.9 (no JDK installed), when you java -version from the command line you will get a prompt to download Java 7 from Oracle. But what if you need JDK 6?
Some Java applications can trigger the install of JDK 6 via software update. In my case it was Intellij IDEA. But one utility that Apple added to OSX was the /usr/libexec/java_home executable, which will output a path suitable for use as JAVA_HOME value. Without arguments, it will output the path for the default JDK:
$ export JAVA_HOME=$(/usr/libexec/java_home)
$ echo $JAVA_HOME
$ java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
You can also specify a certain version (running this should trigger an install of JDK 6 via software update – I’m not sure because Intellij had already triggered it in my situation):
$ export JAVA_HOME=$(/usr/libexec/java_home -v1.6)
$ echo $JAVA_HOME
$ java -version
java version "1.6.0_65"
Java(TM) SE Runtime Environment (build 1.6.0_65-b14-462-11M4609)
Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-462, mixed mode)
So now we have a way to easily switch between JDK 6 and 7.
Xcode is easy to install via the App Store. You will also need the command line tools for compiling various things. You can trigger the install of the command line tools by invoking one of the stubs of the tools included, e.g.:
Homebrew is like a package manager for OSX that handles download, compilation, and installation of various tools and libraries. We will use it later to install dependencies.
ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go/install)"
If the brew doctor command completes without error, you’re ready to move on.
The Storm documentation recommends using 0mq version 2.1.7, however I’ve found that 2.1.7 will not build on OS X 10.9 (Mavericks). The closest version I’ve found that will build under Mavericks is 2.1.9.
The new/preferred method is to use brew tap homebrew/versions, however that repository does not have 0mq 2.1.9. The fallback method is to use the brew versions command to find and checkout a specific version:
Compiling and instlling JZMQ is where most of pain comes in, and most of the errors that come out of the build process are cryptic.
Follow the steps below exactly and you should be spared that pain:
git clone https://github.com/nathanmarz/jzmq.git
brew install pkg-config
brew install automake
brew install libtool
export JAVA_HOME=$(/usr/libexec/java_home -v1.6)
sudo ln -s /System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers/ /Library/Java/Home/include
javac -d . org/zeromq/*.java
sudo make install
The pkg-config, automake, and libtool are dependencies of the JZMQ build process. After installing those, we switch to JDK 6. Next we symlink the JDK 6 header directory where the JZMQ build process will be able to find it. Without it you will get an error like:
cannot find jni_md.h in /Library/Java/Home/include
The touch src/classdist_noinst.stamp command prevents the following error:
No rule to make target `classdist_noinst.stamp'
Next, we manually compile the JZMQ java sources. Without that step, you will get the following error:
error: cannot access org.zeromq.ZMQ
class file for org.zeromq.ZMQ not found
javadoc: error - Class org.zeromq.ZMQ not found.
Error: No classes were specified on the command line. Try -help.
make: *** [org_zeromq_ZMQ.h] Error 15
make: *** [all-recursive] Error 1
git clone https://github.com/nathanmarz/storm.git
lein sub install
This will build and install the storm jars so they can be used as dependencies in other Leiningen/Maven/Gradle/etc. projects. If you want to create a distribution ZIP archive, run the following script:
Hopefully this will save some future storm contributors some time Googling cryptic errors.
Whenever you write code samples for an API or library – or even a blog post – you should treat it as though you are writing the production guidance system for the space shuttle. Okay, that may be a little extreme, but you should at least make your best effort to ensure the code is error free and follows best practices. Why? Any bugs, vulnerabilities, anti-patterns, or cruft of any kind will replicate like drunken bunnies.
In a blog post titled Why Windows 8 drivers are buggy Andrey Karpov discusses what he found when performing static analysis on Windows 8 driver examples published by Microsoft. In short, those examples aren’t exactly stellar. But they’re just examples and aren’t that important, right? I beg to differ.
In my opinion, the salient point of the article is best summarized in the following paragraph (emphasis mine):
“Bugs in samples are not that critical as bugs in real-life software. Nevertheless, bugs can migrate from samples to real-life projects and cause developers a lot of troubles. Even within the Windows 8 Driver Samples pack we found identical bugs. The reason is obvious: copying-and-pasting a code fragment from a nearby sample. In the same way these errors will get into real-life drivers.”
In fact, I would rearrange that first sentence to read: “Bugs in samples will become critical bugs in real-life software.”
The evils of copying-and-pasting aside, the truth is that developers reading your sample code will likely (perhaps blindly) trust that a given example is the right way to do things. Like it or not, when writing examples you are acting as a teacher and can’t make any assumptions about your pupils.
To illustrate how sample code can proliferate, consider this commit to the storm-starter project. That was from two years ago, shortly after storm had been open-sourced. The impetus was to help people more familiar with java/Maven than clojure/Leiningen get started with storm. Since then, that code/comment has proliferated fairly far and wide, as can be seen from a google search for “keep storm out of the jar-with-dependencies”. In retrospect, I should have used the maven shade plugin instead, since it plays better with certain frameworks like Spring. Not a big deal, but the cat drunken bunny is out of the bag and on a love mission.
I’ll admit that I’m in no way perfect in this respect. But I strive toward that goal and believe that all serious developers should as well.
I recently had the opportunity to present at NYC* Big Data Tech Day with Brian O’Neill to speak about some of the work we’ve been doing at HMS, developing our platform for real-time Master Data Management.
Our approach leverages Storm, Cassandra, Kafka, and Elastic Search to gather data from thousands of data sources, and meld them together in near real time, to produce a single high quality health care practitioner database.
Our talk was titled “A Big Data Quadfecta: Cassandra, Storm, Elastic Search and Kafka” and should resonate with any beer enthusiasts!
Slides and video below. Unfortunately the slides are not visible in the video, but you can hopefully follow along (somewhat) with the embedded slides.