下面是一个很简单的教程,5步就可以完成windows下面spark环境的配置,请完成这些基础配置之后,再到IDE里面进行开发, 否则,会有一些意想不到的问题。 建议还是用scala语言进行开发,因为它就是spark的本土开发语言
开始之前:
为了避免不必要的麻烦,对环境变量 JAVA_HOME 和 PATH 做如下替换
Replace “Program Files” with “Progra~1″
Replace “Program Files (x86)” with “Progra~2″
Example: “C:\Program FIles\Java\jdk1.8.0_161″ –> “C:\Progra~1\Java\jdk1.8.0_161″
一. Java 8安装
Before you start make sure you have Java 8 installed and the environment variables correctly defined:
Download Java JDK 8 from Java’s official website
Set the following environment variables:
JAVA_HOME = C:\Progra~1\Java\jdk1.8.0_161
PATH += C:\Progra~1\Java\jdk1.8.0_161\bin
Optional: _JAVA_OPTIONS = -Xmx512M -Xms512M (To avoid common Java Heap Memory problems with Spark)
Tip: Progra~1 is the shortened path for “Program Files”.
二. Spark: 下载和安装
-
Download Spark from Spark’s official website
Choose the newest release (2.3.0 in my case)
Choose the newest package type (Pre-built for Hadoop 2.7 or later in my case)
Download the .tgz file -
Extract the .tgz file into D:\Spark
Note: In this guide I’ll be using my D drive but obviously you can use the C drive also -
Set the environment variables:
SPARK_HOME = D:\Spark\spark-2.3.0-bin-hadoop2.7
PATH += D:\Spark\spark-2.3.0-bin-hadoop2.7\bin
三. Spark: winutils下载和安装
-
Download winutils.exe from here: https://github.com/steveloughran/winutils Choose the same version as the package type you choose for the Spark .tgz file you chose in section 2 (in my case: hadoop-2.7.1)
You need to navigate inside the hadoop-X.X.X folder, and inside the bin folder you will find winutils.exe
If you chose the same version as me (hadoop-2.7.1) here is the direct link: https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe -
Move the winutils.exe file to the bin folder inside SPARK_HOME,
In my case: D:\Spark\spark-2.3.0-bin-hadoop2.7\bin -
Set the folowing environment variable to be the same as SPARK_HOME:
HADOOP_HOME = D:\Spark\spark-2.3.0-bin-hadoop2.7
四. 可选: 临时目录权限的修改
Hive Permissions Bug, 你运行的目录可能不是D盘,但是可以参考如下修改
- Create the folder D:\tmp\hive
-
Execute the following command in cmd started using the option Run as administrator
cmd> winutils.exe chmod -R 777 D:\tmp\hive
-
Check the permissions
cmd> winutils.exe ls -F D:\tmp\hive
五. 可选: 安装Scala
If you are planning on using Scala instead of Python for programming in Spark, follow this steps: 1. Download Scala from their official website Download the Scala binaries for Windows (scala-2.12.4.msi in my case)
-
Install Scala from the .msi file
-
Set the environment variables:
SCALA_HOME = C:\Progra~2\scala
PATH += C:\Progra~2\scala\bin
Tip: Progra~2 is the shortened path for “Program Files (x86)”. -
Check if scala is working by running the following command in the cmd
cmd> scala -version
本文由markdown编辑,更多格式参考:https://daringfireball.net/projects/markdown/syntax#blockquote