Upgrading Hadoop and Spark for an IntelliJ IDEA sbt project

Intelli J IDEA is one of the most popular IDEs for Scala (and therefore Spark) development. Since a new Spark version is usually released every three months, an update to the current Spark and Hadoop version may be necessary from time to time.

Here’s my setup

  • Windows 7, x64 Workstation using 96GB RAM and two Xeon E5-2640
  • Hadoop and Spark installed directly on Windows (setting up a virtualized environment, e.g. using Virtual Box lead to serveral issues like using submitted files as well as being constantly behind versionwise)
  • Adminrights available

Steps

  1. Download Spark from https://spark.apache.org/downloads.html, select “Pre-built for Hadoop X.X or and later”
  2. Download Hadoop from http://hadoop.apache.org/releases.html#Download by clicking on “binary” of the corresponding version to match your Spark Version
  3. Extract to some portable dir
  4. Set environment variable “HADOOP_HOME” to root location of Hadoop extracted files, e.g. E:\portableProgramms\hadoop\hadoop-2.7.3. If your standarduser doesn’t have administrative rights, you can use „Win“ > type „cmd“ > rightclick „Run as different user“ > enter your admin credentials > paste „control sysdm.cpl“ > hit Enter.
  5. Set environment variable “SPARK_HOME” to root location of extracted spark files, e.g. “E:\portableProgramms\spark\spark-2.0.0-bin-hadoop2.7”
  6. Update Intelli J project by changing the spark libraray to the folder containing the jars and set the new scala version displayed after executing “spark-shell” in the commandline
  7. Update buid.sbt (Scala Version, Spark Version)

Cheers

Dieser Beitrag wurde unter Software, Tutorials abgelegt und mit , , , , , verschlagwortet. Setze ein Lesezeichen auf den Permalink.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.