【Spark】 Windows下开发环境的配置

下面是一个很简单的教程，5步就可以完成windows下面spark环境的配置，请完成这些基础配置之后，再到IDE里面进行开发，否则，会有一些意想不到的问题。建议还是用scala语言进行开发，因为它就是spark的本土开发语言

开始之前:

为了避免不必要的麻烦，对环境变量 JAVA_HOME 和 PATH 做如下替换

Replace “Program Files” with “Progra~1″
Replace “Program Files (x86)” with “Progra~2″
Example: “C:\Program FIles\Java\jdk1.8.0_161″ –> “C:\Progra~1\Java\jdk1.8.0_161″

一. Java 8安装

Before you start make sure you have Java 8 installed and the environment variables correctly defined:
Download Java JDK 8 from Java’s official website
Set the following environment variables:

JAVA_HOME = C:\Progra~1\Java\jdk1.8.0_161
PATH += C:\Progra~1\Java\jdk1.8.0_161\bin

Optional: _JAVA_OPTIONS = -Xmx512M -Xms512M (To avoid common Java Heap Memory problems with Spark)

Tip: Progra~1 is the shortened path for “Program Files”.

二. Spark: 下载和安装

Download Spark from Spark’s official website
Choose the newest release (2.3.0 in my case)
Choose the newest package type (Pre-built for Hadoop 2.7 or later in my case)
Download the .tgz file
Extract the .tgz file into D:\Spark
Note: In this guide I’ll be using my D drive but obviously you can use the C drive also
Set the environment variables:

SPARK_HOME = D:\Spark\spark-2.3.0-bin-hadoop2.7
PATH += D:\Spark\spark-2.3.0-bin-hadoop2.7\bin

三. Spark: winutils下载和安装

Download winutils.exe from here: https://github.com/steveloughran/winutils Choose the same version as the package type you choose for the Spark .tgz file you chose in section 2 (in my case: hadoop-2.7.1)
You need to navigate inside the hadoop-X.X.X folder, and inside the bin folder you will find winutils.exe
If you chose the same version as me (hadoop-2.7.1) here is the direct link: https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe
Move the winutils.exe file to the bin folder inside SPARK_HOME,
In my case: D:\Spark\spark-2.3.0-bin-hadoop2.7\bin
Set the folowing environment variable to be the same as SPARK_HOME:

HADOOP_HOME = D:\Spark\spark-2.3.0-bin-hadoop2.7

四. 可选: 临时目录权限的修改

Hive Permissions Bug, 你运行的目录可能不是D盘，但是可以参考如下修改

Create the folder D:\tmp\hive
Execute the following command in cmd started using the option Run as administrator

cmd> winutils.exe chmod -R 777 D:\tmp\hive
Check the permissions

cmd> winutils.exe ls -F D:\tmp\hive

五. 可选: 安装Scala

If you are planning on using Scala instead of Python for programming in Spark, follow this steps: 1. Download Scala from their official website Download the Scala binaries for Windows (scala-2.12.4.msi in my case)

Install Scala from the .msi file
Set the environment variables:

SCALA_HOME = C:\Progra~2\scala
PATH += C:\Progra~2\scala\bin
Tip: Progra~2 is the shortened path for “Program Files (x86)”.
Check if scala is working by running the following command in the cmd

cmd> scala -version

本文由markdown编辑，更多格式参考：https://daringfireball.net/projects/markdown/syntax#blockquote

发表评论取消回复

电子邮件地址不会被公开。必填项已用*标注

姓名 *

电子邮件 *

站点

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

您可以使用这些HTML标签和属性： <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>