在IDEA中用Maven项目编写Scala程序

之前编写Spark程序的时候没有用Maven,下载包配置环境,特别是切换版本都比较折腾。本文介绍各位在IDEA中用Maven项目编写Scala。

创建Maven项目

1.打开菜单File->New->Project
2.在弹出的“New Project”对话框中选择左边侧栏的Maven,右边栏勾选Create from archetype并点击Add Archetype...按钮,找到Archetype Scala Archetype Simple的最新版本填入相应groupId、artifactId以及version即可。
3.选择刚才创建的net.alchim31.maven,并点击Next一路下去即可。
4.scala版本修改

<scala.version>2.10.5</scala.version>
<scala.compat.version>2.10.5</scala.compat.version>

5.Spark依赖添加

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 -->
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.6.0-cdh5.8.0</version>
</dependency>

注意需要添加相应的repository

<repositories>
    <repository>
        <id>cloudera</id>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
</repositories>

HDFS Demo

1.初始环境
pom.xml中添加:spark和spark的相关的依赖 dependency

<dependencies>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
    <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>1.6.1</version>
    </dependency>
    <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.6.4</version>
    </dependency>

</dependencies>

2.配置SparkConf和操作

package hdfs
import org.apache.spark.{SparkContext, SparkConf}
object HdfsWordCount {
def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("HdfsWordCount")
    .setMaster("local[5]")
    val sc = new SparkContext(sparkConf)
    val lines = sc.textFile("file:/Users/jiangzl/Desktop/testSet.txt")
//    val lines = sc.textFile("hdfs://localhost:9000/user/datasys/input/*")
    val words = lines.flatMap(_.split(""))
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _).sortBy(_._2, false)
    println("-----开始-----")
    // 终端输出
    wordCounts.foreach(wordCounts => println(wordCounts._1 + " : " + wordCounts._2))
    // 文件输出
    wordCounts.repartition(1).saveAsTextFile("file:/Users/jiangzl/Desktop/result.txt")
    // hdfs写入
    wordCounts.repartition(1).saveAsTextFile("hdfs://localhost:9000/user/datasys/result.txt")
    SparkContext.getOrCreate()
    sc.stop()
}
}

3.结果

( ,850)
(a,264)
(e,220)
(t,209)
(i,203)
(s,165)
(d,154)
(r,149)
(_,149)
(o,139)
(l,124)
(p,114)
(,,112)
(c,105)
(',102)
(n,99)
(,91)
((,67)
(),67)
(u,66)
(y,64)
(.,55)
(h,40)
($,39)
(-,37)
(m,37)
(f,35)
(b,30)
(k,25)
(1,25)
(",20)
(0,19)
(g,18)
(\,17)

参考

IntelliJ IDEA创建Scala的Maven项目
HDFS Demo
Scala and Maven
Maven构建应用程序常用配置

打赏支持:如果你觉得我的文章对你有所帮助,可以打赏我哟。