Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.
* 해결되지 않는 문제가 발생되면
==>rm -r /data/hadoop/dfs/*으로 dfs정보가 저장되는 폴더를 모두 지우고 아래의 수동 기동방법을 따른다...
주의할점은 기존의 데이타등 모든 정보를 잃게 되므로 주의할것
0. data및 log파일의 경로를 아래와 같이 수정함(내용이 다르게 적혀있으므로 참고해서 봐야함)
가. data경로 : /data/hadoop/dfs, /data/zookeeper/data ...
나. log파일 : /logs/hadoop/logs, /logs/zookeeper/logs ...
0-1. 단축실행(start-all.sh을 사용하는 경우)
가. zookeeper기동(master, node1, node2의 3개 서버에서 각각 실행시켜줌)
bin/zkServer.sh start
나. JobHistoryServer기동(hadoop master에서 실행)
sbin/mr-jobhistory-daemon.sh start historyserver
다. hbase기동 (hbase master가 설치된 노드에서 실행함)
bin/start-hbase.sh
bin/hbase-daemon.sh start master (secondary master 노드에서 실행)
라. hive(설치된 서버에서 실행)
- hive server시작(hive가 설치된 master에서 실행)
:nohup hive server2 &
- hive metastore서버 시작(hive가 설치된 master에서 실행)
:nohup hive --service metastore &
마. hadoop 실행(master서버에서만 실행)
- hdfs구동 : sbin/start-dfs.sh
- yarn구동 : sbin/start-yarn.sh
* standby resourcemanager가 기동이 안될때 : sbin/yarn-daemon.sh start resourcemanager
바. oozied.sh start (oozie가 설치된 노드에서 실행함)
사. spark 실행
- master기동(active, standby에서 각각 실행) : sbin/start-master.sh
- Worker기동(active에서 실행) : sbin/start-slaves.sh
- history서버 기동(active에서 실행) : sbin/start-history-server.sh
아. kafka 실행(broker서버 각각에서 실행)
- bin/kafka-server-start.sh config/server-1.properties &
- bin/kafka-server-start.sh config/server-2.properties &
- bin/kafka-server-start.sh config/server-3.properties &
---아래는 start-all.sh을 사용하지 않고 각각을 실행하는 경우이다(수동으로 기동).----------
1. 모든 데몬이 내려간 상태에서 HA관련 설정을 마무리하고 적용하는 경우를 가정한다.
2. zookeeper기동(master, node1, node2의 3개 서버에서 각각 실행시켜줌)
bin/zkServer.sh start
3. zookeeper에 HA를 위한 znode를 추가한다(namenode중 하나의 노드에서 실행하면 됨, 최초한번)
root@master:/root# sbin/hdfs zkfc -formatZK
15/05/05 16:12:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/05/05 16:12:34 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at master/192.168.10.100:9000
15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:host.name=master
15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60
15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/local/jdk1.7.0_60/jre
15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.class.path=...
15/05/05 16:12:35 INFO ha.ActiveStandbyElector: Session connected.
===============================================
The configured parent znode /hadoop-ha/mycluster already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/mycluster? (Y or N) Y
15/05/05 16:12:56 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/mycluster from ZK...
15/05/05 16:12:57 INFO ha.ActiveStandbyElector: Successfully deleted /hadoop-ha/mycluster from ZK.
15/05/05 16:12:57 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
15/05/05 16:12:57 INFO zookeeper.ClientCnxn: EventThread shut down
15/05/05 16:12:57 INFO zookeeper.ZooKeeper: Session: 0x24d2320df3b0000 closed
4. QJM로 사용할 서버마다 JournalNode를 실행한다.(예, node1, node2, node3)
root@master:/root# sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-master.out
*각각 실행시켜주지 않으면 namenode format할때 journalnode에 접속하지 못해서 아래와 같은 오류가 발생함
(아래는 node3에서 journalnode기동되지 않은 경우임)
--------------------------------------------------------------------------------------
rg.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 2 exceptions thrown:
192.168.10.101:8485: Call From master/192.168.10.100 to node1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
192.168.10.102:8485: Call From master/192.168.10.100 to node2:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)
at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:884)
at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:171)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:937)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1379)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
15/05/05 16:18:04 FATAL namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 2 exceptions thrown:
192.168.10.101:8485: Call From master/192.168.10.100 to node1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
192.168.10.102:8485: Call From master/192.168.10.100 to node2:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)
at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:884)
at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:171)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:937)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1379)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
15/05/05 16:18:04 INFO util.ExitUtil: Exiting with status 1
15/05/05 16:18:04 INFO namenode.NameNode: SHUTDOWN_MSG:
-------------------------------------------------------------------------------------------
5. active로 사용할 namenode에서 "hdfs namenode -format"을 수행한다.(최초 한번)
root@master:/root# sbin/hdfs namenode -format
15/05/05 16:36:22 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/192.168.10.100
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath =...
STARTUP_MSG: java = 1.7.0_60
************************************************************/
15/05/05 16:36:22 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/05/05 16:36:22 INFO namenode.NameNode: createNameNode [-format]
15/05/05 16:36:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-4e01a04d-26db-4422-bea6-ba768a96334f
15/05/05 16:36:27 INFO namenode.FSNamesystem: No KeyProvider found.
15/05/05 16:36:27 INFO namenode.FSNamesystem: fsLock is fair:true
15/05/05 16:36:27 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/05/05 16:36:27 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
15/05/05 16:36:27 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/05/05 16:36:27 INFO blockmanagement.BlockManager: The block deletion will start around 2015 May 05 16:36:27
15/05/05 16:36:27 INFO util.GSet: Computing capacity for map BlocksMap
15/05/05 16:36:27 INFO util.GSet: VM type = 32-bit
15/05/05 16:36:27 INFO util.GSet: 2.0% max memory 966.8 MB = 19.3 MB
15/05/05 16:36:27 INFO util.GSet: capacity = 2^22 = 4194304 entries
15/05/05 16:36:28 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
15/05/05 16:36:28 INFO blockmanagement.BlockManager: defaultReplication = 3
15/05/05 16:36:28 INFO blockmanagement.BlockManager: maxReplication = 512
15/05/05 16:36:28 INFO blockmanagement.BlockManager: minReplication = 1
15/05/05 16:36:28 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
15/05/05 16:36:28 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
15/05/05 16:36:28 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/05/05 16:36:28 INFO blockmanagement.BlockManager: encryptDataTransfer = false
15/05/05 16:36:28 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
15/05/05 16:36:28 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)
15/05/05 16:36:28 INFO namenode.FSNamesystem: supergroup = supergroup
15/05/05 16:36:28 INFO namenode.FSNamesystem: isPermissionEnabled = false
15/05/05 16:36:28 INFO namenode.FSNamesystem: Determined nameservice ID: mycluster
15/05/05 16:36:28 INFO namenode.FSNamesystem: HA Enabled: true
15/05/05 16:36:28 INFO namenode.FSNamesystem: Append Enabled: true
15/05/05 16:36:29 INFO util.GSet: Computing capacity for map INodeMap
15/05/05 16:36:29 INFO util.GSet: VM type = 32-bit
15/05/05 16:36:29 INFO util.GSet: 1.0% max memory 966.8 MB = 9.7 MB
15/05/05 16:36:29 INFO util.GSet: capacity = 2^21 = 2097152 entries
15/05/05 16:36:29 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/05/05 16:36:29 INFO util.GSet: Computing capacity for map cachedBlocks
15/05/05 16:36:29 INFO util.GSet: VM type = 32-bit
15/05/05 16:36:29 INFO util.GSet: 0.25% max memory 966.8 MB = 2.4 MB
15/05/05 16:36:29 INFO util.GSet: capacity = 2^19 = 524288 entries
15/05/05 16:36:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
15/05/05 16:36:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/05/05 16:36:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
15/05/05 16:36:29 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/05/05 16:36:29 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
15/05/05 16:36:29 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/05/05 16:36:29 INFO util.GSet: VM type = 32-bit
15/05/05 16:36:29 INFO util.GSet: 0.029999999329447746% max memory 966.8 MB = 297.0 KB
15/05/05 16:36:29 INFO util.GSet: capacity = 2^16 = 65536 entries
15/05/05 16:36:29 INFO namenode.NNConf: ACLs enabled? false
15/05/05 16:36:29 INFO namenode.NNConf: XAttrs enabled? true
15/05/05 16:36:29 INFO namenode.NNConf: Maximum size of an xattr: 16384
Re-format filesystem in Storage Directory /data/dfs/namenode ? (Y or N) Y
Re-format filesystem in QJM to [192.168.10.101:8485, 192.168.10.102:8485, 192.168.10.103:8485] ? (Y or N) Y
15/05/05 16:37:23 INFO namenode.FSImage: Allocated new BlockPoolId: BP-90521690-192.168.10.100-1430815043897
15/05/05 16:37:24 INFO common.Storage: Storage directory /data/dfs/namenode has been successfully formatted.
15/05/05 16:37:26 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/05/05 16:37:26 INFO util.ExitUtil: Exiting with status 0
15/05/05 16:37:26 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.10.100
************************************************************/
6. active 노드에서 namenode노드를 띄운다.
root@master:/root# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-master.out
(참고 : namenode오류시 https://www.gooper.com/ss/index.php?mid=bigdata&category=2789&page=2&document_srl=3183)
6-1. standby로 사용할 노드의 namenode정보는 "hdfs namenode -bootstrapStandby"를 실행하여 active namenode의 정보를 복사해준다(최초 한번)(7번항목 참조)
6-2. standby 노드에서 namenode를 띄운다.
root@slave:/root# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-master.out
(참고 : namenode오류시 https://www.gooper.com/ss/index.php?mid=bigdata&category=2789&page=2&document_srl=3183)
7. standby namenode를 설정한다(standby할 노드(예,node1)에서 실행한다, 최초한번)(6-1번항목에서 진행했으면 불필요함)
root@node1:/data# sbin/hdfs namenode -bootstrapStandby
15/05/05 16:43:51 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = node1/192.168.10.101
STARTUP_MSG: args = [-bootstrapStandby]
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath =...
STARTUP_MSG: java = 1.7.0_60
************************************************************/
15/05/05 16:43:51 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/05/05 16:43:51 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
15/05/05 16:43:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
=====================================================
About to bootstrap Standby ID nn2 from:
Nameservice ID: mycluster
Other Namenode ID: nn1
Other NN's HTTP address: http://master:50070
Other NN's IPC address: master/192.168.10.100:9000
Namespace ID: 892192946
Block pool ID: BP-90521690-192.168.10.100-1430815043897
Cluster ID: CID-4e01a04d-26db-4422-bea6-ba768a96334f
Layout version: -60
=====================================================
15/05/05 16:43:59 INFO common.Storage: Storage directory /data/dfs/namenode has been successfully formatted.
15/05/05 16:44:02 INFO namenode.TransferFsImage: Opening connection to http://master:50070/imagetransfer?getimage=1&txid=0&storageInfo=-60:892192946:0:CID-4e01a04d-26db-4422-bea6-ba768a96334f
15/05/05 16:44:02 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
15/05/05 16:44:03 INFO namenode.TransferFsImage: Transfer took 0.09s at 0.00 KB/s
15/05/05 16:44:03 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 351 bytes.
15/05/05 16:44:03 INFO util.ExitUtil: Exiting with status 0
15/05/05 16:44:03 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node1/192.168.10.101
************************************************************/
* namenode, journalnode를 띄우지 않고 hadoop명령을 실행하면 connection오류가 발생한다.
-------------------------------------------------------------------------------------------
15/05/05 16:40:12 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/05/05 16:40:12 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
15/05/05 16:40:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/05/05 16:40:18 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/05/05 16:40:19 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/05/05 16:40:20 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/05/05 16:40:21 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/05/05 16:40:22 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/05/05 16:40:23 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/05/05 16:40:24 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/05/05 16:40:25 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/05/05 16:40:26 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/05/05 16:40:27 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/05/05 16:40:27 FATAL ha.BootstrapStandby: Unable to fetch namespace information from active NN at master/192.168.10.100:9000: Call From node1/192.168.10.101 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
15/05/05 16:40:27 INFO util.ExitUtil: Exiting with status 2
-------------------------------------------------------------------------------------------
8. active및 standby namenode에서 각각 zkfc를 실행한다.(master, node1에서 각각 실행함)
-root@master:/root# sbin/hadoop-daemon.sh start zkfc
starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-master.out
-root@node1:/data# hadoop-daemon.sh start zkfc
starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-node1.out
* jps로 확인하면 DFSZKFailoverController가 보인다.
9. primary namenode가 active가 아니고 standby일 경우 다음 명령을 주어 active로 전환시킨다.(수동복구 인경우)
root@node1:/data# bin/hdfs haadmin -transitionToActive nn1
* 자동복구인 경우 아래와 같은 메세지가 표시되고 명령은 거부됨
Automatic failover is enabled for NameNode at node1/192.168.10.101:9000
Refusing to manually manage HA state, since it may cause
a split-brain scenario or other incorrect state.
If you are very sure you know what you are doing, please
specify the forcemanual flag.
9-1. resourcemanager를 띄운다.(master, node1에서 각각 실행함)
-root@master:/root# sbin/yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-master.out
-root@node1:/data# yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-node1.out
10. HDFS(NameNode) HA 상태확인(nn1과 nn2를 확인 해봄)
- root@master:/root# bin/hdfs haadmin -getServiceState nn1
active
- root@master:/root# bin/hdfs haadmin -getServiceState nn2
standby
10-1, ResourceManager HA 상태확인 (rm1과 rm2를 확인해봄)
- root@master:/root# bin/yarn rmadmin -getServiceState rm1
standby
- root@master:/root# bin/yarn rmadmin -getServiceState rm2
active
11. datanode를 띄운다. (master에서 hadoop-daemons.sh를 실행하면 slaves파일(예,node1~4)에 등록된 서버 모두가 기동됨)
root@master:/root# sbin/hadoop-daemons.sh start datanode
node3: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node3.out
node4: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node4.out
node2: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node2.out
node1: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node1.out
12. 내용없음
13. nodemanager를 띄운다.(master에서 yarn-daemons.sh를 실행하면 slaves파일(예,node1~4)에 등록된 서버 모두가 기동됨)
root@master:/root# sbin/yarn-daemons.sh start nodemanager
node4: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node4.out
node1: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node1.out
node3: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node3.out
node2: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node2.out
* HDFS HA모드에서는 secondarynamenode를 띄워줄 필요없음
아래는 secondarynamenode를 강제로 띄웠을 때의 메세지를 보여준다.
root@node1:/data# hadoop secondarynamenode
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
15/05/05 18:45:11 INFO namenode.SecondaryNameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting SecondaryNameNode
STARTUP_MSG: host = node1/192.168.10.101
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.6.0
STARTUP_MSG: classpath = ...
STARTUP_MSG: java = 1.7.0_60
************************************************************/
15/05/05 18:45:11 INFO namenode.SecondaryNameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/05/05 18:45:14 FATAL namenode.SecondaryNameNode: Failed to start secondary namenode
java.io.IOException: Cannot use SecondaryNameNode in an HA cluster. The Standby Namenode will perform checkpointing.
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:187)
at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671)
15/05/05 18:45:14 INFO util.ExitUtil: Exiting with status 1
15/05/05 18:45:14 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down SecondaryNameNode at node1/192.168.10.101
************************************************************/
14. JobHistoryServer기동(master에서 실행)
root@master:/root# sbin/mr-jobhistory-daemon.sh start historyserver
15. jps로 프로세스확인
root@master:/root# jps
7232 QuorumPeerMain
8784 DFSZKFailoverController
13955 Jps
11300 Kafka
11862 HistoryServer
20153 JobHistoryServer
8919 ResourceManager
8281 NameNode
7705 HMaster
11598 Master
root@node1:/data# jps
7538 DFSZKFailoverController
6867 NameNode
7319 JournalNode
11800 ResourceManager
15642 Jps
6203 QuorumPeerMain
7726 NodeManager
7038 DataNode
10190 Master
9919 Kafka
13951 HRegionServer
root@node2:/data# jps
7353 HRegionServer
5275 DataNode
4764 QuorumPeerMain
7564 HMaster
5614 NodeManager
5951 Kafka
5487 JournalNode
7839 Jps
root@node3:/root# jps
4499 DataNode
8790 Jps
4730 NodeManager
8414 HRegionServer
root@node4:/data# jps
8516 HRegionServer
7044 NodeManager
8856 Jps
6814 DataNode
16 stop-all.sh화면
root@master:/root# stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master node1]
master: stopping namenode
node1: stopping namenode
node2: stopping datanode
node3: stopping datanode
node4: stopping datanode
node1: stopping datanode
Stopping journal nodes [node1 node2 node3]
node3: stopping journalnode
node1: stopping journalnode
node2: stopping journalnode
Stopping ZK Failover Controllers on NN hosts [master node1]
node1: stopping zkfc
master: stopping zkfc
stopping yarn daemons
stopping resourcemanager
node2: stopping nodemanager
node1: stopping nodemanager
node4: stopping nodemanager
node3: stopping nodemanager
no proxyserver to stop
17. start-all.sh 화면
root@master:/root# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master node1]
master: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-namenode-master.out
node1: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-namenode-node1.out
node3: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node3.out
node4: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node4.out
node2: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node2.out
node1: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node1.out
Starting journal nodes [node1 node2 node3]
node2: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-journalnode-node2.out
node3: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-journalnode-node3.out
node1: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-journalnode-node1.out
Starting ZK Failover Controllers on NN hosts [master node1]
master: starting zkfc, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-zkfc-master.out
node1: starting zkfc, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-zkfc-node1.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-master.out
node4: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node4.out
node3: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node3.out
node2: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node2.out
node1: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node1.out
18. active및 standby상태를 아래의 url로 확인 가능하다
http://node1:50070/ <- 이게 페이지를 찾을 수 없다는 오류가 나는경우는 namenode를 수동으로 띄워준다.
19. mapreduce를 테스트(gooper게정으로 어플리케이션을 실행하는 경우)
- hdfs dfs -mkdir /user (root게정으로 실행)
- hdfs dfs -mkdir /user/gooper
- hdfs dfs -mkdir /user/gooper/in
- hdfs dfs -put a.txt /user/gooper/in
-hadoop@master:~$ yarn jar $HOME/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount in out
15/05/05 22:53:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/05/05 22:53:41 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
15/05/05 22:53:44 INFO input.FileInputFormat: Total input paths to process : 1
15/05/05 22:53:45 INFO mapreduce.JobSubmitter: number of splits:1
15/05/05 22:53:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1430823520141_0013
15/05/05 22:53:49 INFO impl.YarnClientImpl: Submitted application application_1430823520141_0013
15/05/05 22:53:49 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1430823520141_0013/
15/05/05 22:53:49 INFO mapreduce.Job: Running job: job_1430823520141_0013
15/05/05 22:55:09 INFO mapreduce.Job: Job job_1430823520141_0013 running in uber mode : false
15/05/05 22:55:09 INFO mapreduce.Job: map 0% reduce 0%
15/05/05 22:55:33 INFO mapreduce.Job: map 100% reduce 0%
15/05/05 22:55:58 INFO mapreduce.Job: map 100% reduce 100%
15/05/05 22:55:59 INFO mapreduce.Job: Job job_1430823520141_0013 completed successfully
15/05/05 22:56:00 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=75
FILE: Number of bytes written=217099
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=148
HDFS: Number of bytes written=53
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=40174
Total time spent by all reduces in occupied slots (ms)=43504
Total time spent by all map tasks (ms)=20087
Total time spent by all reduce tasks (ms)=21752
Total vcore-seconds taken by all map tasks=20087
Total vcore-seconds taken by all reduce tasks=21752
Total megabyte-seconds taken by all map tasks=20569088
Total megabyte-seconds taken by all reduce tasks=22274048
Map-Reduce Framework
Map input records=3
Map output records=4
Map output bytes=61
Map output materialized bytes=75
Input split bytes=102
Combine input records=4
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=75
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1684
CPU time spent (ms)=7030
Physical memory (bytes) snapshot=220651520
Virtual memory (bytes) snapshot=715882496
Total committed heap usage (bytes)=134516736
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=46
File Output Format Counters
Bytes Written=53