Upgrading Hadoop and Spark for an IntelliJ IDEA sbt project

22nd Sep 2016

Intelli J IDEA is one of the most popular IDEs for Scala (and therefore Spark) development. Since a new Spark version is usually released every three months, an update to the current Spark and Hadoop version may be necessary from time to time.

Here’s my setup

Windows 7, x64 Workstation using 96GB RAM and two Xeon E5-2640
Hadoop and Spark installed directly on Windows (setting up a virtualized environment, e.g. using Virtual Box lead to serveral issues like using submitted files as well as being constantly behind versionwise)
Adminrights available

Steps

Download Spark from https://spark.apache.org/downloads.html, select “Pre-built for Hadoop X.X or and later”
Download Hadoop from http://hadoop.apache.org/releases.html#Download by clicking on “binary” of the corresponding version to match your Spark Version
Extract to some portable dir
Set environment variable “HADOOP_HOME” to root location of Hadoop extracted files, e.g. E:\portableProgramms\hadoop\hadoop-2.7.3. If your standarduser doesn’t have administrative rights, you can use „Win“ > type „cmd“ > rightclick „Run as different user“ > enter your admin credentials > paste „control sysdm.cpl“ > hit Enter.
Set environment variable “SPARK_HOME” to root location of extracted spark files, e.g. “E:\portableProgramms\spark\spark-2.0.0-bin-hadoop2.7”
Update Intelli J project by changing the spark libraray to the folder containing the jars and set the new scala version displayed after executing “spark-shell” in the commandline
Update buid.sbt (Scala Version, Spark Version)

Cheers

THEME_QUARK.BLOG.ITEM.PREV_POST THEME_QUARK.BLOG.ITEM.NEXT_POST