서버 프로그래밍

Mac에 Hadoop 설치 및 실행 (Hive 연동 포함)

나숑 2015. 1. 28. 17:05


workspace_lineplus.zip


Mac에 Hadoop 설치 및 실행


hadoop  계정 생성


java 버전 확인

<SSH 설치>

환경설정-공유 : 원격로그인체크-hadoop계정만 허용

공개키 생성 :ssh-keygen

id_rsa.pub >> authorized_keys

확인 : ssh localhost


hadoop download

tar -xzvf hadoop-1.21.1.tar.gz
심볼릭링크 : ln -s hadoop-1.2.1 hadoop


hadoop/conf/hadoop-env.sh 편집
export JAVA_HOME=/Library/Java/Home
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"


* /Library/Java/Home 심볼릭 링크가 안걸려있는 경우에는 직접 걸어주어야


core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>


hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>


mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>


네임노드 포맷
./bin/hadoop namenode -format


하둡 실행
./bin/start-all.sh


하둡 종료
./bin/stop-all.sh


* 입력 파일 업로드
./bin/hadoop fs -put input.txt input.txt


* WordCount 실행
./bin/hadoop jar lineplus-wordcount.jar me.line.wordcount.WordCount input.txt wordcount_output


* WordCount 실행 결과 확인
./bin/hadoop fs -cat wordcount_output/part-r-00000


* 폴더 삭제하는 방법
./bin/hadoop fs -rmr wordcount_output


* Hadoop 정상적으로 실행되는지 확인 
> jps
5984 Jps
4401 DataNode
4644 TaskTracker
4314 NameNode
4490 SecondaryNameNode

4554 JobTracker


* 실행중인 map-reduce job 삭제 명령어
./bin/hadoop job -kill job_201501271506_0003

-------------------------------------------------


HIVE를 이용한 WordCount 구현


<hive 압축 풀기>

tar xvzf apache-hive-0.13.1-bin.tar

ln -s apache-hive-0.13.1-bin hive


<경로 설정>

vi .profile

————————————————————

#hive home

export HIVE_HOME=/users/hadoop/hive

#hadoop home

export HADOOP_HOME1=/users/hadoop/hadoop


#path

export PATH=$PATH:$HADOOP_HOME1/bin:$HADOOP_HOME1/sbin:$HIVE_HOME/bin

————————————————————


source ~/.profile


<HDFS상 폴더 추가>

hadoop fs -rmr /tmp

hadoop fs -mkdir /tmp

hadoop fs -chmod g+w /tmp


hadoop fs -rmr /user/hive/warehouse

hadoop fs -rmr /user/hive

hadoop fs -mkdir /user/hive

hadoop fs -mkdir /user/hive/warehouse

hadoop fs -chmod g+w /user/hive

hadoop fs -chmod g+w /user/hive/warehouse


hadoop fs -rmr /home/hive/warehouse

hadoop fs -rmr /home/hive

hadoop fs -mkdir /home/hive

hadoop fs -mkdir /home/hive/warehouse

hadoop fs -chmod g+w /home

hadoop fs -chmod g+w /home/hive

hadoop fs -chmod g+w /home/hive/warehouse


<HIVE 실행>

$ hive

Logging initialized using configuration in jar:file:/Users/hadoop/apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar!/hive-log4j.properties

hive> 


SHOW TABLES;

CREATE TABLE docs (line STRING);

LOAD DATA INPATH 'input.txt' OVERWRITE INTO TABLE docs;

SELECT * FROM docs;


read a book

write a book

Time taken: 0.484 seconds, Fetched: 2 row(s)


CREATE TABLE word_counts AS SELECT word,count(1) AS cnt 

FROM (SELECT explode(split(line,' ')) AS word FROM docs) W

GROUP BY word

ORDER BY word;

SELECT * FROM word_counts;


a 2

book 2

read 1

write 1

Time taken: 0.03 seconds, Fetched: 4 row(s)


hive> exit;