CentOS 6에서 Spark 설치 및 Scala를 이용한 Spark 프로그래밍

2017. 8. 5. 18:00서버 프로그래밍

<Spark 설치 및 실행>

$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

$ tar xzvf spark-2.1.0-bin-hadoop2.7.tgz


$ ./sbin/start-master.sh

$ ./sbin/start-slave.sh spark://localhost:7077

$ ./bin/pyspark –master spark://localhost:7077


<Scala 설치>

$ wget https://downloads.lightbend.com/scala/2.12.3/scala-2.12.3.tgz

$ tar xzvf scalar-2.12.3.tgz

$ vi .bashrc


export SCALA_HOME=/home/eduuser/scala-2.12.3

export PATH=$PATH:$SCALA_HOME/bin


$ source .bashrc


$ curl https://bintray.com/sbt/rpm/rpm > bintray-sbt-rpm.repo

$ sudo mv bintray-sbt-rpm.repo /etc/yum.repos.d/

$ sudo yum install sbt


$ cd .sbt

$ cd 0.13

$ mkdir plugins

$ cd plugins

$ vi plugins.sbt


addSbtPlugin("com.typesafe.sbteclipse"%"sbteclipse-plugin"%"4.0.0")


-------------------------------

<Eclipse spark programming>


$ mkdir simple-spark

$ cd simple-spark


$ vi build.sbt


$ sbt eclipse


Eclipse Import Project

append New Source Folder : src/main/scala

append New Scala Object :  simpleapp

code writing


$ sbt package

[info] Loading global plugins from /home/eduuser/.sbt/0.13/plugins

[info] Loading project definition from /home/eduuser/Documents/workspace-sts-3.6.4.RELEASE/simple-spark/project

[info] Set current project to simple (in build file:/home/eduuser/Documents/workspace-sts-3.6.4.RELEASE/simple-spark/)

[info] Compiling 1 Scala source to /home/eduuser/Documents/workspace-sts-3.6.4.RELEASE/simple-spark/target/scala-2.11/classes...

[info] 'compiler-interface' not yet compiled for Scala 2.11.8. Compiling...

[info]   Compilation completed in 16.249 s

[info] Packaging /home/eduuser/Documents/workspace-sts-3.6.4.RELEASE/simple-spark/target/scala-2.11/simple_2.11-1.0.jar ...

[info] Done packaging.

[success] Total time: 19 s, completed 2017. 8. 3 오후 3:04:31


$ ~/spark/bin/spark-submit --class "simpleapp" --master localhost:7077 --executor-memory 512m --total-executor-cores 1 simple_2.11-1.0.jar local simpleoutput

$ cat simpleoutput/part-00000

1

2

3

4

5


$ mkdir rankingcount

$ cd rankingcount

$ vi build.sbt

$ sbt eclipse


$ wget http://www.grouplens.org/system/files/ml-100k.zip

$ cd ~/Downloads

$ cp u.data ~/Documents/workspace-sts-3.6.4.RELEASE//rankingcount


Eclipse Import Project

append New Source Folder : src/main/scala

append New Scala Object :  RatingCounter

code writing


$ sbt package

$ ~/spark/bin/spark-submit --class "RatingCounter" --master localhost:7077 target/scala-2.11/rankingcount_2.11-1.0.jar 

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

(1,6110)                                                                        

(2,11370)

(3,27145)

(4,34174)

(5,21201)


------------------------------------------------------------


$ mkdir wordcount-spark

$ cd wordcount-spark


$ vi build.sbt


name := "wordcount-spark"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark"%"spark-core_2.11"%"2.1.0"%"provided"


$ sbt eclipse

$ vi input.txt

$ cat input.txt

read a book

write a book


Eclipse Import Project

append New Source Folder : src/main/scala

append New Scala Object : WordCount

code writing


$ sbt package

$ ~/spark/bin/spark-submit --class "WordCount" --master localhost:7077 target/scala-2.11/wordcount-spark_2.11-1.0.jar 

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

[Stage 0:>                                                          (0 + 2) / 2]

(read,1)

(book,2)

(a,2)

(write,1)


$ ~/spark/bin/spark-submit --class "MaxTemp" --master localhost:7077 target/scala-2.11/maxtemp-spark_2.11-1.0.jar 

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

(1901,317)


4.maxtemp-spark.zip

3.wordcount-spark.zip

2.rankingcount.zip

1.simple-spark.zip