【Spark】 Windows下开发环境的配置

下面是一个很简单的教程,5步就可以完成windows下面spark环境的配置,请完成这些基础配置之后,再到IDE里面进行开发, 否则,会有一些意想不到的问题。 建议还是用scala语言进行开发,因为它就是spark的本土开发语言


为了避免不必要的麻烦,对环境变量 JAVA_HOME 和 PATH 做如下替换

Replace “Program Files” with “Progra~1″
Replace “Program Files (x86)” with “Progra~2″
Example: “C:\Program FIles\Java\jdk1.8.0_161″ –> “C:\Progra~1\Java\jdk1.8.0_161″

一. Java 8安装

Before you start make sure you have Java 8 installed and the environment variables correctly defined:
Download Java JDK 8 from Java’s official website
Set the following environment variables:

JAVA_HOME = C:\Progra~1\Java\jdk1.8.0_161
PATH += C:\Progra~1\Java\jdk1.8.0_161\bin

Optional: _JAVA_OPTIONS = -Xmx512M -Xms512M (To avoid common Java Heap Memory problems with Spark)

Tip: Progra~1 is the shortened path for “Program Files”.

二. Spark: 下载和安装

  1. Download Spark from Spark’s official website
    Choose the newest release (2.3.0 in my case)
    Choose the newest package type (Pre-built for Hadoop 2.7 or later in my case)
    Download the .tgz file

  2. Extract the .tgz file into D:\Spark
    Note: In this guide I’ll be using my D drive but obviously you can use the C drive also

  3. Set the environment variables:

    SPARK_HOME = D:\Spark\spark-2.3.0-bin-hadoop2.7
    PATH += D:\Spark\spark-2.3.0-bin-hadoop2.7\bin

三. Spark: winutils下载和安装

  1. Download winutils.exe from here: https://github.com/steveloughran/winutils Choose the same version as the package type you choose for the Spark .tgz file you chose in section 2 (in my case: hadoop-2.7.1)
    You need to navigate inside the hadoop-X.X.X folder, and inside the bin folder you will find winutils.exe
    If you chose the same version as me (hadoop-2.7.1) here is the direct link: https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe

  2. Move the winutils.exe file to the bin folder inside SPARK_HOME,
    In my case: D:\Spark\spark-2.3.0-bin-hadoop2.7\bin

  3. Set the folowing environment variable to be the same as SPARK_HOME:

    HADOOP_HOME = D:\Spark\spark-2.3.0-bin-hadoop2.7

四. 可选: 临时目录权限的修改

Hive Permissions Bug, 你运行的目录可能不是D盘,但是可以参考如下修改

  1. Create the folder D:\tmp\hive
  2. Execute the following command in cmd started using the option Run as administrator

    cmd> winutils.exe chmod -R 777 D:\tmp\hive

  3. Check the permissions

    cmd> winutils.exe ls -F D:\tmp\hive

五. 可选: 安装Scala

If you are planning on using Scala instead of Python for programming in Spark, follow this steps: 1. Download Scala from their official website Download the Scala binaries for Windows (scala-2.12.4.msi in my case)

  1. Install Scala from the .msi file

  2. Set the environment variables:

    SCALA_HOME = C:\Progra~2\scala
    PATH += C:\Progra~2\scala\bin
    Tip: Progra~2 is the shortened path for “Program Files (x86)”.

  3. Check if scala is working by running the following command in the cmd

    cmd> scala -version



电子邮件地址不会被公开。 必填项已用*标注

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

您可以使用这些HTML标签和属性: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>