2016. 11. 3. 17:43ㆍ서버 프로그래밍
Mac에 Hadoop 설치 및 실행
hadoop 계정 생성
java 버전 확인
<SSH 설치>
환경설정-공유 : 원격로그인체크-hadoop계정만 허용
공개키 생성 :ssh-keygen
id_rsa.pub >> authorized_keys
확인 : ssh localhost
hadoop download
tar -xzvf hadoop-1.2.1.tar.gz
심볼릭링크 : ln -s hadoop-1.2.1 hadoop
hadoop/conf/hadoop-env.sh 편집
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
* /Library/Java/Home 심볼릭 링크가 안걸려있는 경우에는 직접 걸어주어야 함
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>
</configuration>
네임노드 포맷
./bin/hadoop namenode -format
하둡 실행
./bin/start-all.sh
하둡 종료
./bin/stop-all.sh
./bin/hadoop-daemon.sh start namenode
./bin/hadoop-daemon.sh start secondarynamenode
./bin/hadoop-daemon.sh start datanode
./bin/hadoop-daemon.sh start jobtracker
./bin/hadoop-daemon.sh start tasktracker
* 입력 파일 업로드
./bin/hadoop fs -put input.txt input.txt
./bin/hadoop jar busanit-wordcount.jar kr.or.busanit.wordcount.WordCount input.txt wordcount_output
./bin/hadoop fs -cat wordcount_output/part-r-00000
http://localhost:50030/jobtracker.jsp
https://github.com/tomwhite/hadoop-book/tree/master/input/ncdc/all
./bin/hadoop jar busanit-maxtemp.jar kr.or.busanit.maxtemp.MaxTemp ncdc maxtemp_output
./bin/hadoop fs -cat maxtemp_output/part-r-00000
./bin/hadoop fs -get maxtemp_output/part-r-00000 maxtemp.txt
./bin/hadoop jar busanit-delaycount.jar kr.or.busanit.delaycount.DelayCount air/1987.csv delaycount_output
./bin/hadoop fs -cat delaycount_output/part-r-00000
* 실행중인 map-reduce job 삭제 명령어
./bin/hadoop job -kill job_201501271506_0003
* 폴더 삭제하는 방법
./bin/hadoop fs -rmr wordcount_output
* Hadoop이 정상적으로 실행되는지 확인
> jps
5984 Jps
4401 DataNode
4644 TaskTracker
4314 NameNode
4490 SecondaryNameNode
4554 JobTracker
-------------------------------------------------
HIVE를 이용한 WordCount 구현
<hive 압축 풀기>
tar xvzf apache-hive-0.13.1-bin.tar
ln -s apache-hive-0.13.1-bin hive
<경로 설정>
vi .profile
————————————————————
#hive home
export HIVE_HOME=/users/hadoop/hive
#hadoop home
export HADOOP_HOME1=/users/hadoop/hadoop
#path
export PATH=$PATH:$HADOOP_HOME1/bin:$HADOOP_HOME1/sbin:$HIVE_HOME/bin
————————————————————
source ~/.profile
<HDFS상 폴더 추가>
hadoop fs -rmr /tmp
hadoop fs -mkdir /tmp
hadoop fs -chmod g+w /tmp
hadoop fs -rmr /user/hive/warehouse
hadoop fs -rmr /user/hive
hadoop fs -mkdir /user/hive
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -chmod g+w /user/hive
hadoop fs -chmod g+w /user/hive/warehouse
hadoop fs -rmr /home/hive/warehouse
hadoop fs -rmr /home/hive
hadoop fs -mkdir /home/hive
hadoop fs -mkdir /home/hive/warehouse
hadoop fs -chmod g+w /home
hadoop fs -chmod g+w /home/hive
hadoop fs -chmod g+w /home/hive/warehouse
<HIVE 실행>
$ hive
Logging initialized using configuration in jar:file:/Users/hadoop/apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive>
SHOW TABLES;
CREATE TABLE docs (line STRING);
LOAD DATA INPATH 'input.txt' OVERWRITE INTO TABLE docs;
SELECT * FROM docs;
read a book
write a book
Time taken: 0.484 seconds, Fetched: 2 row(s)
CREATE TABLE word_counts AS SELECT word,count(1) AS cnt
FROM (SELECT explode(split(line,' ')) AS word FROM docs) W
GROUP BY word
ORDER BY word;
SELECT * FROM word_counts;
a 2
book 2
read 1
write 1
Time taken: 0.03 seconds, Fetched: 4 row(s)
hive> exit;