Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.
0. 모든 설치는 root로 하고.. hadoop계정이 mapreduce job을 실행할 수 있도록 설정함..
0-1. root 패스워드 변경 : passwd root
0-2. hostname 수정 : vi /etc/hostname파일을 수정함
0-3.변경사항 적용하기 (서버 재부팅 필요없음) : /bin/hostname -F /etc/hostname
1. 네트웍설정 (공유기에 8포트 기가비트 스위치허브를 물려서 사용하는경우) -root로 실행
(/etc/network/interfaces파일을 아래와 같이 수정함)
auto lo
iface lo inet loopback
#auto eth0
#iface eth0 inet dhcp
auto eth0
iface eth0 inet static
address 192.168.10.100
netmask 255.255.255.0
gateway 192.168.10.1
#broadcast 192.168.10.1
DNS는 /etc/resolv.conf 에서 아래와 같이 설정한다.
nameserver 168.126.63.1
nameserver 168.126.63.2
2. 계정(hadoop)생성및 password설정(root로 실행) : adduser hadoop, passwd hadoop
hadoop 계정을 sudoers에 등록함
가. root로 사용자 전환 : su - root
나. /etc/sudoers의 파일 permission 변경 : chmod u+w /etc/sudoers
다. /etc/sudoers에 일반 사용자 등록(hadoop)
=> # User privilege specification부분에 추가함
라. /etc/sudoers 퍼미션 원복 : chmod u-w /etc/sudoers
3. arm용 jdk 다운로드(root로 실행)
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-arm-downloads-2187468.html 에서
Linux ARM v6/v7 Hard Float ABI | 67.79 MB | jdk-7u60-linux-arm-vfp-hflt.tar.gz |
설치 :
가. 압축풀기 : tar zxvf jdk-7u60-linux-arm-vfp-hflt.gz
나. /usr/local로 옮기기 : mv jdk1.7.0_60/ /usr/local/
다. 심볼릭 링크 걸기 : ln -s jdk1.7.0_60/ java
라. /etc/profile수정 : vi /etc/profile 하고 상단에 아래 내용을 넣음(전체 적용됨)
export JAVA_HOME=/usr/local/java
export PATH="$JAVA_HOME/bin:$PATH"
export CLASSPATH=".:$JAVA_HOME/jre/lib/ext:$JAVA_HOME/bin/tools.jar"
export CATALINA_OPTS="Djava.awt.headless=true"
마. profile 적용 : source /etc/profile
바. 확인 : java -version, javac -version
#. hadoop@Bananapi:~$ wget apache.tt.co.kr/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
--2014-07-05 17:18:25-- http://apache.tt.co.kr/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
Resolving apache.tt.co.kr (apache.tt.co.kr)... 121.125.79.185
Connecting to apache.tt.co.kr (apache.tt.co.kr)|121.125.79.185|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 63851630 (61M) [application/x-gzip]
Saving to: `hadoop-1.2.1.tar.gz'
근데 속도가 21.2k/s <-- 넘하는데...
===>wget으로 받으면 너무 느려서.. local에 다운로드후 scp등으로 전송해줌..
4. hadoop 다운로드(root로 실행)
http://www.apache.org/dyn/closer.cgi/hadoop/common/
http://apache.tt.co.kr/hadoop/common/hadoop-1.2.1/에서 hadoop-1.2.1-bin.tar.gz을 다운로드 받음
설치 :
가. hadoop 압축풀기 : tar xvfz hadoop-1.2.1-bin.tar.gz
나. /usr/local/로 이동 : mv hadoop-1.2.1 /usr/local/
다. 심볼릭 링크 생성 : ln -s hadoop-1.2.1/ hadoop
라. 환경설정파일 수정 : vi /etc/hosts
127.0.0.1 localhost
#127.0.1.1 Bananapi
192.168.10.100 master
192.168.10.101 slave1
192.168.10.102 slave2
192.168.10.103 slave3
마. /etc/profile수정(추가)
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_HOME_WARN_SUPPRESS=true
바. 적용 : source /etc/profile
사. 디렉토리 생성(job을 실행할 계정, 즉, hadoop계정으로 실행할것, 혹은 root계정으로 생성하고 hadoop계정으로 소유자및그룹을 변경)
-->네임노드에는 name만 데이터노드에는 data만 잇으면 되지만.. 한곳에서 설정하고 image clone하기 위함임
hadoop@Bananapi:/usr/local/hadoop/conf# mkdir -p /home/hadoop/work
hadoop@Bananapi:/usr/local/hadoop/conf# mkdir -p /home/hadoop/work/mapred
hadoop@Bananapi:/usr/local/hadoop/conf# mkdir -p /home/hadoop/work/mapred/system
hadoop@Bananapi:/usr/local/hadoop/conf# mkdir -p /home/hadoop/work/data
hadoop@Bananapi:/usr/local/hadoop/conf# mkdir -p /home/hadoop/work/name
hadoop@Bananapi:/usr/local/hadoop/conf# mkdir -p /home/hadoop/work/tmp <-자동으로 만들어지네..
*권한을 755로 모두 바꿔줌
: chmod -R 755 work
* 최종 work폴더 밑의 구조및 권한
adoop@master:~/work$ chmod -R 755 mapred tmp
hadoop@master:~/work$ ll
total 24
drwxr-xr-x 6 hadoop hadoop 4096 Jul 6 01:45 ./
drwxr-xr-x 19 hadoop hadoop 4096 Jul 6 01:22 ../
drwxr-xr-x 2 hadoop hadoop 4096 Jul 6 01:45 data/
drwxr-xr-x 3 hadoop hadoop 4096 Jul 6 01:45 mapred/
drwxr-xr-x 2 hadoop hadoop 4096 Jul 6 01:45 name/
drwxr-xr-x 2 hadoop hadoop 4096 Jul 6 02:02 tmp/
* hadoop계정용 디렉토리 생성및 권한부여(root로 실행)
아. conf설정
root@Bananapi:/usr/local/hadoop/conf# cat core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<!-- value>/tmp/hadoop-${user.name}</value -->
<value>/home/${user.name}/work/tmp</value>
</property>
</configuration>
root@Bananapi:/usr/local/hadoop/conf# vi hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<!-- value>/home/hadoop/work/name</value -->
<value>/home/${user.name}/work/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<!-- value>/home/hadoop/work/data</value -->
<value>/home/${user.name}/work/data</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
</configuration>
root@Bananapi:/usr/local/hadoop/conf# vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://master:9001</value>
</property>
<property>
<name>mapred.system.dir</name>
<!-- value>/home/hadoop/work/mapred/system</value -->
<value>/home/${user.name}/work/mapred/system</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<!-- value>*</value -->
<value>root,hadoop</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
<!-- value>localhost</value -->
</property>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<!-- value>/home/hadoop/work/tmp/mapred/staging</value -->
<value>/user</value>
</property>
</configuration>
root@Bananapi:/usr/local/hadoop/conf# vi slaves
master
slave1
#slave2
#slave3
root@Bananapi:/usr/local/hadoop/conf# vi master
master
root@Bananapi:/usr/local/hadoop/conf# vi hadoop-env.sh
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
export JAVA_HOME=/usr/local/java
# 아래의 값을 설정하지 않으면.. job은 제출되지만.. 모니터링 화면의 목록에 나타나지 않고 0%에서 진행되지 않음
# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=
export HADOOP_CLASSPATH=/usr/local/hadoop/lib
아-2 : 각 서버를 패스워드 없이 드나들고 hadoop이 각 노드와의 처리를 위해서 노인증처리 작업을 한다(root계정으로 한다)
아래 작업은 1곳에만 하고 나머지는 그대로 복사하여 쓴다.
$> cd [Enter] ( 홈디렉토리 이동 )
$> ssh-keygen -t rsa [Enter]
$> [Enter][Enter][Enter][Enter][Enter][Enter] ....
* 이렇게 하고나서 다시 프롬프트가 나오면 확인차
$> ls -al [Enter]
* .ssh 라는 이름의 숨겨진 디렉토리가 보일것이다.
$> cd .ssh [Enter]
$> cp id_rsa.pub authorized_keys [Enter]
* 이제 다른 서버로 접속 ( master에서 설치했다면 slave1 이나 slave2 로 접속한다.)
$> ssh slave1
[어쩌고 저쩌고] yes/no ? 물을것이다.
한번도 들어와 본적이 없는곳이라면 물어보겠지 일종의 암호를 저장하시겠습니까? 와 비슷한... (암호를 저장하는것은 아니다!)
- yes 하면 패스워드를 물을것이 들어가본다.
- 잘 들어가지면 다시 나온다.
$>exit [Enter]
.ssh 디렉토리내부를 다른서버에 아래의 명령으로 복사
$>scp * hadoop@slave1:.ssh [Enter]
* 그러면 패스워드 한번 묻고 복사가 될것이다.
* 그렇게 하고나서 다시 ssh master 나 ssh 아이피등 어찌되었든 3개의 서버를 와따갔다해보면
yes/no ? 최초에 한번묻고 패스워드는 안물어 볼 것이다.( 이래야 정상인데;;; )
ㄴ. Hadoop 관련 설정 ( 내 하둡 위치 : /usr/local/hadoop )
- conf 디렉토리 안에 설정파일 몇가지를 수정한다.
- hadoop-env.sh
#JAVA_HOME 수정후 주석풀기.
#HADOOP_HEAPSIZE 주석풀기.
#HADOOP_SSH_OPTS 주석풀기. [data노드가 연결이 안되어서 이것저것 찾아보다가 이부분도 풀었다;;]
자. namenode포맷
root@Bananapi:/usr/local/hadoop/conf# hadoop namenode -format
Warning: $HADOOP_HOME is deprecated.
14/07/05 10:38:22 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = Bananapi/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_60
************************************************************/
Re-format filesystem in /home/hadoop/work/name ? (Y or N) Y
14/07/05 10:38:36 INFO util.GSet: Computing capacity for map BlocksMap
14/07/05 10:38:36 INFO util.GSet: VM type = 32-bit
14/07/05 10:38:36 INFO util.GSet: 2.0% max memory = 1013710848
14/07/05 10:38:36 INFO util.GSet: capacity = 2^22 = 4194304 entries
14/07/05 10:38:36 INFO util.GSet: recommended=4194304, actual=4194304
14/07/05 10:38:38 INFO namenode.FSNamesystem: fsOwner=root
14/07/05 10:38:38 INFO namenode.FSNamesystem: supergroup=supergroup
14/07/05 10:38:38 INFO namenode.FSNamesystem: isPermissionEnabled=true
14/07/05 10:38:38 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
14/07/05 10:38:38 WARN namenode.FSNamesystem: The dfs.support.append option is in your configuration, however append is not supported. This configuration option is no longer required to enable sync
14/07/05 10:38:38 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/07/05 10:38:38 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
14/07/05 10:38:38 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/07/05 10:38:40 INFO common.Storage: Image file /home/hadoop/work/name/current/fsimage of size 110 bytes saved in 0 seconds.
14/07/05 10:38:40 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/hadoop/work/name/current/edits
14/07/05 10:38:40 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/hadoop/work/name/current/edits
14/07/05 10:38:40 INFO common.Storage: Storage directory /home/hadoop/work/name has been successfully formatted.
14/07/05 10:38:40 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Bananapi/127.0.1.1
************************************************************/
차. hadoop 기동 : start-all.sh
root@master:~# start-all.sh
starting namenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-master.out
master: starting datanode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-master.out
slave1: starting datanode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-slave1.out
master: starting secondarynamenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-master.out
starting jobtracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-master.out
slave1: starting tasktracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-slave1.out
master: starting tasktracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-master.out
카. 데몬확인
root@Bananapi:/usr/local/hadoop/conf# jps
5663 DataNode
4461 NameNode
4715 JobTracker
5774 SecondaryNameNode
5926 TaskTracker
5969 Jps
타. hadoop디렉토리 생성후 디렉토리 경로내부 목록 확인
root@Bananapi:/usr/local/hadoop/conf# hadoop fs -mkdir work
root@Bananapi:/usr/local/hadoop/conf# hadoop fs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2014-07-05 10:41 /home
drwxr-xr-x - root supergroup 0 2014-07-05 11:16 /user
root@Bananapi:/usr/local/hadoop/conf# hadoop fs -lsr /
drwxr-xr-x - root supergroup 0 2014-07-05 10:41 /home
drwxr-xr-x - root supergroup 0 2014-07-05 10:41 /home/hadoop
drwxr-xr-x - root supergroup 0 2014-07-05 10:41 /home/hadoop/work
drwxr-xr-x - root supergroup 0 2014-07-05 11:08 /home/hadoop/work/mapred
drwx------ - root supergroup 0 2014-07-05 11:13 /home/hadoop/work/mapred/system
-rw------- 1 root supergroup 4 2014-07-05 11:13 /home/hadoop/work/mapred/system/jobtracker.info
drwxr-xr-x - root supergroup 0 2014-07-05 11:16 /user
drwxr-xr-x - root supergroup 0 2014-07-05 11:16 /user/root
drwxr-xr-x - root supergroup 0 2014-07-05 11:16 /user/root/work
파. sample jar파일에 있는 wordcount를 실행시켜서 정상작동하는지 확인한다.(hadoop계정으로 실행한다)
(가) data경로 생성
: hadoop fs -mkdir /user/hadoop/work
hadoop fs -mkdir /user/hadoop/work/input
hadoop fs -mkdir /user/hadoop/work/output
(나) 데이터 upload
: hadoop fs -put a.txt /user/hadoop/work/input
(다) job실행
hadoop@Bananapi:~/working$ hadoop jar hadoop-examples-1.2.1.jar wordcount work/input/a.txt work/output/test6
14/07/05 16:40:44 INFO input.FileInputFormat: Total input paths to process : 1
14/07/05 16:40:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/07/05 16:40:44 WARN snappy.LoadSnappy: Snappy native library not loaded
14/07/05 16:40:46 INFO mapred.JobClient: Running job: job_201407051639_0001
14/07/05 16:40:47 INFO mapred.JobClient: map 0% reduce 0%
14/07/05 16:41:13 INFO mapred.JobClient: map 100% reduce 0%
14/07/05 16:41:31 INFO mapred.JobClient: map 100% reduce 100%
14/07/05 16:41:45 INFO mapred.JobClient: Job complete: job_201407051639_0001
14/07/05 16:41:45 INFO mapred.JobClient: Counters: 29
14/07/05 16:41:45 INFO mapred.JobClient: Job Counters
14/07/05 16:41:45 INFO mapred.JobClient: Launched reduce tasks=1
14/07/05 16:41:45 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=34392
14/07/05 16:41:45 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/07/05 16:41:45 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/07/05 16:41:45 INFO mapred.JobClient: Launched map tasks=1
14/07/05 16:41:45 INFO mapred.JobClient: Data-local map tasks=1
14/07/05 16:41:45 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18254
14/07/05 16:41:45 INFO mapred.JobClient: File Output Format Counters
14/07/05 16:41:45 INFO mapred.JobClient: Bytes Written=10792
14/07/05 16:41:45 INFO mapred.JobClient: FileSystemCounters
14/07/05 16:41:45 INFO mapred.JobClient: FILE_BYTES_READ=13374
14/07/05 16:41:45 INFO mapred.JobClient: HDFS_BYTES_READ=17846
14/07/05 16:41:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=141014
14/07/05 16:41:45 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=10792
14/07/05 16:41:45 INFO mapred.JobClient: File Input Format Counters
14/07/05 16:41:45 INFO mapred.JobClient: Bytes Read=17734
14/07/05 16:41:45 INFO mapred.JobClient: Map-Reduce Framework
14/07/05 16:41:45 INFO mapred.JobClient: Map output materialized bytes=13374
14/07/05 16:41:45 INFO mapred.JobClient: Map input records=223
14/07/05 16:41:45 INFO mapred.JobClient: Reduce shuffle bytes=13374
14/07/05 16:41:45 INFO mapred.JobClient: Spilled Records=1300
14/07/05 16:41:45 INFO mapred.JobClient: Map output bytes=24323
14/07/05 16:41:45 INFO mapred.JobClient: Total committed heap usage (bytes)=127930368
14/07/05 16:41:45 INFO mapred.JobClient: CPU time spent (ms)=7600
14/07/05 16:41:45 INFO mapred.JobClient: Combine input records=1630
14/07/05 16:41:45 INFO mapred.JobClient: SPLIT_RAW_BYTES=112
14/07/05 16:41:45 INFO mapred.JobClient: Reduce input records=650
14/07/05 16:41:45 INFO mapred.JobClient: Reduce input groups=650
14/07/05 16:41:45 INFO mapred.JobClient: Combine output records=650
14/07/05 16:41:45 INFO mapred.JobClient: Physical memory (bytes) snapshot=179404800
14/07/05 16:41:45 INFO mapred.JobClient: Reduce output records=650
14/07/05 16:41:45 INFO mapred.JobClient: Virtual memory (bytes) snapshot=689119232
14/07/05 16:41:45 INFO mapred.JobClient: Map output records=1630
(라) 결과확인
hadoop@Bananapi:~/working$ hadoop fs -ls work/output/test6
Found 3 items
-rw-r--r-- 1 hadoop hadoop 0 2014-07-05 16:41 /user/hadoop/work/output/test6/_SUCCESS
drwxr-xr-x - hadoop hadoop 0 2014-07-05 16:40 /user/hadoop/work/output/test6/_logs
-rw-r--r-- 1 hadoop hadoop 10792 2014-07-05 16:41 /user/hadoop/work/output/test6/part-r-00000
*결과값 확인 : hadoop@Bananapi:~/working$ hadoop fs -cat /user/hadoop/work/output/test6/part-r-00000