빅데이터 분석 및 분산 처리를 위한 하둡

2016. 11. 3. 17:43서버 프로그래밍

hadoop_workspace.zip


Mac에 Hadoop 설치 및 실행


hadoop  계정 생성


java 버전 확인

<SSH 설치>

환경설정-공유 : 원격로그인체크-hadoop계정만 허용

공개키 생성 :ssh-keygen

id_rsa.pub >> authorized_keys

확인 : ssh localhost


hadoop download

tar -xzvf hadoop-1.2.1.tar.gz

심볼릭링크 : ln -s hadoop-1.2.1 hadoop



hadoop/conf/hadoop-env.sh 편집

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home

export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"



* /Library/Java/Home 심볼릭 링크가 안걸려있는 경우에는 직접 걸어주어야 함



core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>



hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>



mapred-site.xml

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>hdfs://localhost:9001</value>

</property>

</configuration>



네임노드 포맷

./bin/hadoop namenode -format



하둡 실행

./bin/start-all.sh



하둡 종료

./bin/stop-all.sh



./bin/hadoop-daemon.sh start namenode

./bin/hadoop-daemon.sh start secondarynamenode

./bin/hadoop-daemon.sh start datanode

./bin/hadoop-daemon.sh start jobtracker

./bin/hadoop-daemon.sh start tasktracker


* 입력 파일 업로드

./bin/hadoop fs -put input.txt input.txt



./bin/hadoop jar busanit-wordcount.jar kr.or.busanit.wordcount.WordCount input.txt wordcount_output


./bin/hadoop fs -cat wordcount_output/part-r-00000


http://localhost:50030/jobtracker.jsp


https://github.com/tomwhite/hadoop-book/tree/master/input/ncdc/all


./bin/hadoop jar busanit-maxtemp.jar kr.or.busanit.maxtemp.MaxTemp ncdc maxtemp_output



./bin/hadoop fs -cat maxtemp_output/part-r-00000


./bin/hadoop fs -get maxtemp_output/part-r-00000 maxtemp.txt



./bin/hadoop jar busanit-delaycount.jar kr.or.busanit.delaycount.DelayCount air/1987.csv delaycount_output


./bin/hadoop fs -cat delaycount_output/part-r-00000




* 실행중인 map-reduce job 삭제 명령어

./bin/hadoop job -kill job_201501271506_0003



* 폴더 삭제하는 방법

./bin/hadoop fs -rmr wordcount_output



* Hadoop이 정상적으로 실행되는지 확인 

> jps

5984 Jps

4401 DataNode

4644 TaskTracker

4314 NameNode

4490 SecondaryNameNode


4554 JobTracker





-------------------------------------------------


HIVE를 이용한 WordCount 구현


<hive 압축 풀기>

tar xvzf apache-hive-0.13.1-bin.tar

ln -s apache-hive-0.13.1-bin hive


<경로 설정>

vi .profile

————————————————————

#hive home

export HIVE_HOME=/users/hadoop/hive

#hadoop home

export HADOOP_HOME1=/users/hadoop/hadoop


#path

export PATH=$PATH:$HADOOP_HOME1/bin:$HADOOP_HOME1/sbin:$HIVE_HOME/bin

————————————————————


source ~/.profile


<HDFS상 폴더 추가>

hadoop fs -rmr /tmp

hadoop fs -mkdir /tmp

hadoop fs -chmod g+w /tmp


hadoop fs -rmr /user/hive/warehouse

hadoop fs -rmr /user/hive

hadoop fs -mkdir /user/hive

hadoop fs -mkdir /user/hive/warehouse

hadoop fs -chmod g+w /user/hive

hadoop fs -chmod g+w /user/hive/warehouse


hadoop fs -rmr /home/hive/warehouse

hadoop fs -rmr /home/hive

hadoop fs -mkdir /home/hive

hadoop fs -mkdir /home/hive/warehouse

hadoop fs -chmod g+w /home

hadoop fs -chmod g+w /home/hive

hadoop fs -chmod g+w /home/hive/warehouse


<HIVE 실행>

$ hive

Logging initialized using configuration in jar:file:/Users/hadoop/apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar!/hive-log4j.properties

hive> 


SHOW TABLES;

CREATE TABLE docs (line STRING);

LOAD DATA INPATH 'input.txt' OVERWRITE INTO TABLE docs;

SELECT * FROM docs;


read a book

write a book

Time taken: 0.484 seconds, Fetched: 2 row(s)


CREATE TABLE word_counts AS SELECT word,count(1) AS cnt 

FROM (SELECT explode(split(line,' ')) AS word FROM docs) W

GROUP BY word

ORDER BY word;

SELECT * FROM word_counts;


a 2

book 2

read 1

write 1

Time taken: 0.03 seconds, Fetched: 4 row(s)


hive> exit;