Intelli J IDEA is one of the most popular IDEs for Scala (and therefore Spark) development. Since a new Spark version is usually released every three months, an update to the current Spark and Hadoop version may be necessary from time to time.
Here’s my setup
- Windows 7, x64 Workstation using 96GB RAM and two Xeon E5-2640
- Hadoop and Spark installed directly on Windows (setting up a virtualized environment, e.g. using Virtual Box lead to serveral issues like using submitted files as well as being constantly behind versionwise)
- Adminrights available
- Download Spark from https://spark.apache.org/downloads.html, select “Pre-built for Hadoop X.X or and later”
- Download Hadoop from http://hadoop.apache.org/releases.html#Download by clicking on “binary” of the corresponding version to match your Spark Version
- Extract to some portable dir
- Set environment variable “HADOOP_HOME” to root location of Hadoop extracted files, e.g. E:\portableProgramms\hadoop\hadoop-2.7.3. If your standarduser doesn’t have administrative rights, you can use „Win“ > type „cmd“ > rightclick „Run as different user“ > enter your admin credentials > paste „control sysdm.cpl“ > hit Enter.
- Set environment variable “SPARK_HOME” to root location of extracted spark files, e.g. “E:\portableProgramms\spark\spark-2.0.0-bin-hadoop2.7”
- Update Intelli J project by changing the spark libraray to the folder containing the jars and set the new scala version displayed after executing “spark-shell” in the commandline
- Update buid.sbt (Scala Version, Spark Version)