메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


* 해결되지 않는 문제가 발생되면

  ==>rm -r /data/hadoop/dfs/*으로 dfs정보가 저장되는 폴더를 모두 지우고 아래의 수동 기동방법을 따른다...

 주의할점은 기존의 데이타등 모든 정보를 잃게 되므로 주의할것


0. data및 log파일의 경로를 아래와 같이 수정함(내용이 다르게 적혀있으므로 참고해서 봐야함)

 가. data경로 : /data/hadoop/dfs, /data/zookeeper/data ...

 나. log파일 : /logs/hadoop/logs, /logs/zookeeper/logs ...


0-1. 단축실행(start-all.sh을 사용하는 경우)

  가. zookeeper기동(master, node1, node2의 3개 서버에서 각각 실행시켜줌)

        bin/zkServer.sh start

  나. JobHistoryServer기동(hadoop master에서 실행)







        sbin/mr-jobhistory-daemon.sh start historyserver

  다. hbase기동 (hbase master가 설치된 노드에서 실행함)

        bin/start-hbase.sh

        bin/hbase-daemon.sh start master (secondary master 노드에서 실행)

  라. hive(설치된 서버에서 실행)

     - hive server시작(hive가 설치된 master에서 실행)

        :nohup hive server2 &

     - hive metastore서버 시작(hive가 설치된 master에서 실행)

        :nohup hive --service metastore &

  마. hadoop 실행(master서버에서만 실행)

      - hdfs구동 : sbin/start-dfs.sh

      - yarn구동 : sbin/start-yarn.sh

      * standby resourcemanager가 기동이 안될때 : sbin/yarn-daemon.sh start resourcemanager

  바. oozied.sh start (oozie가 설치된 노드에서 실행함)

  사. spark 실행

     - master기동(active, standby에서 각각 실행) : sbin/start-master.sh

     - Worker기동(active에서 실행) : sbin/start-slaves.sh

     - history서버 기동(active에서 실행) : sbin/start-history-server.sh

  아. kafka 실행(broker서버 각각에서 실행)

     - bin/kafka-server-start.sh config/server-1.properties &

     - bin/kafka-server-start.sh config/server-2.properties &

     - bin/kafka-server-start.sh config/server-3.properties &

     

 

---아래는 start-all.sh을 사용하지 않고 각각을 실행하는 경우이다(수동으로 기동).----------

1. 모든 데몬이 내려간 상태에서 HA관련 설정을 마무리하고 적용하는 경우를 가정한다.

 

2. zookeeper기동(master, node1, node2의 3개 서버에서 각각 실행시켜줌)

  bin/zkServer.sh start

 

3. zookeeper에 HA를 위한 znode를 추가한다(namenode중 하나의 노드에서 실행하면 됨, 최초한번

root@master:/root# sbin/hdfs zkfc -formatZK

15/05/05 16:12:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/05/05 16:12:34 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at master/192.168.10.100:9000

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:host.name=master

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/local/jdk1.7.0_60/jre

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.class.path=...

15/05/05 16:12:35 INFO ha.ActiveStandbyElector: Session connected.

===============================================

The configured parent znode /hadoop-ha/mycluster already exists.

Are you sure you want to clear all failover information from

ZooKeeper?

WARNING: Before proceeding, ensure that all HDFS services and

failover controllers are stopped!

===============================================

Proceed formatting /hadoop-ha/mycluster? (Y or N) Y

15/05/05 16:12:56 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/mycluster from ZK...

15/05/05 16:12:57 INFO ha.ActiveStandbyElector: Successfully deleted /hadoop-ha/mycluster from ZK.

15/05/05 16:12:57 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.

15/05/05 16:12:57 INFO zookeeper.ClientCnxn: EventThread shut down

15/05/05 16:12:57 INFO zookeeper.ZooKeeper: Session: 0x24d2320df3b0000 closed

 

4. QJM로 사용할 서버마다 JournalNode를 실행한다.(예, node1, node2, node3)

root@master:/root# sbin/hadoop-daemon.sh start journalnode

starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-master.out

*각각 실행시켜주지 않으면 namenode format할때 journalnode에 접속하지 못해서 아래와 같은 오류가 발생함

(아래는 node3에서 journalnode기동되지 않은 경우임)

--------------------------------------------------------------------------------------

rg.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 2 exceptions thrown:

192.168.10.101:8485: Call From master/192.168.10.100 to node1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

192.168.10.102:8485: Call From master/192.168.10.100 to node2:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

        at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)

        at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)

        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)

        at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:884)

        at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:171)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:937)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1379)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)

15/05/05 16:18:04 FATAL namenode.NameNode: Failed to start namenode.

org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 2 exceptions thrown:

192.168.10.101:8485: Call From master/192.168.10.100 to node1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

192.168.10.102:8485: Call From master/192.168.10.100 to node2:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

        at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)

        at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)

        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)

        at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:884)

        at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:171)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:937)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1379)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)

15/05/05 16:18:04 INFO util.ExitUtil: Exiting with status 1

15/05/05 16:18:04 INFO namenode.NameNode: SHUTDOWN_MSG: 

-------------------------------------------------------------------------------------------

 

5. active로 사용할 namenode에서 "hdfs namenode -format"을 수행한다.(최초 한번)

root@master:/root# sbin/hdfs namenode -format

15/05/05 16:36:22 INFO namenode.NameNode: STARTUP_MSG: 

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = master/192.168.10.100

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 2.6.0

STARTUP_MSG:   classpath =...

STARTUP_MSG:   java = 1.7.0_60

************************************************************/

15/05/05 16:36:22 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]

15/05/05 16:36:22 INFO namenode.NameNode: createNameNode [-format]

15/05/05 16:36:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Formatting using clusterid: CID-4e01a04d-26db-4422-bea6-ba768a96334f

15/05/05 16:36:27 INFO namenode.FSNamesystem: No KeyProvider found.

15/05/05 16:36:27 INFO namenode.FSNamesystem: fsLock is fair:true

15/05/05 16:36:27 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000

15/05/05 16:36:27 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true

15/05/05 16:36:27 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000

15/05/05 16:36:27 INFO blockmanagement.BlockManager: The block deletion will start around 2015 May 05 16:36:27

15/05/05 16:36:27 INFO util.GSet: Computing capacity for map BlocksMap

15/05/05 16:36:27 INFO util.GSet: VM type       = 32-bit

15/05/05 16:36:27 INFO util.GSet: 2.0% max memory 966.8 MB = 19.3 MB

15/05/05 16:36:27 INFO util.GSet: capacity      = 2^22 = 4194304 entries

15/05/05 16:36:28 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false

15/05/05 16:36:28 INFO blockmanagement.BlockManager: defaultReplication         = 3

15/05/05 16:36:28 INFO blockmanagement.BlockManager: maxReplication             = 512

15/05/05 16:36:28 INFO blockmanagement.BlockManager: minReplication             = 1

15/05/05 16:36:28 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2

15/05/05 16:36:28 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false

15/05/05 16:36:28 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000

15/05/05 16:36:28 INFO blockmanagement.BlockManager: encryptDataTransfer        = false

15/05/05 16:36:28 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000

15/05/05 16:36:28 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)

15/05/05 16:36:28 INFO namenode.FSNamesystem: supergroup          = supergroup

15/05/05 16:36:28 INFO namenode.FSNamesystem: isPermissionEnabled = false

15/05/05 16:36:28 INFO namenode.FSNamesystem: Determined nameservice ID: mycluster

15/05/05 16:36:28 INFO namenode.FSNamesystem: HA Enabled: true

15/05/05 16:36:28 INFO namenode.FSNamesystem: Append Enabled: true

15/05/05 16:36:29 INFO util.GSet: Computing capacity for map INodeMap

15/05/05 16:36:29 INFO util.GSet: VM type       = 32-bit

15/05/05 16:36:29 INFO util.GSet: 1.0% max memory 966.8 MB = 9.7 MB

15/05/05 16:36:29 INFO util.GSet: capacity      = 2^21 = 2097152 entries

15/05/05 16:36:29 INFO namenode.NameNode: Caching file names occuring more than 10 times

15/05/05 16:36:29 INFO util.GSet: Computing capacity for map cachedBlocks

15/05/05 16:36:29 INFO util.GSet: VM type       = 32-bit

15/05/05 16:36:29 INFO util.GSet: 0.25% max memory 966.8 MB = 2.4 MB

15/05/05 16:36:29 INFO util.GSet: capacity      = 2^19 = 524288 entries

15/05/05 16:36:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033

15/05/05 16:36:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0

15/05/05 16:36:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000

15/05/05 16:36:29 INFO namenode.FSNamesystem: Retry cache on namenode is enabled

15/05/05 16:36:29 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis

15/05/05 16:36:29 INFO util.GSet: Computing capacity for map NameNodeRetryCache

15/05/05 16:36:29 INFO util.GSet: VM type       = 32-bit

15/05/05 16:36:29 INFO util.GSet: 0.029999999329447746% max memory 966.8 MB = 297.0 KB

15/05/05 16:36:29 INFO util.GSet: capacity      = 2^16 = 65536 entries

15/05/05 16:36:29 INFO namenode.NNConf: ACLs enabled? false

15/05/05 16:36:29 INFO namenode.NNConf: XAttrs enabled? true

15/05/05 16:36:29 INFO namenode.NNConf: Maximum size of an xattr: 16384

Re-format filesystem in Storage Directory /data/dfs/namenode ? (Y or N) Y

Re-format filesystem in QJM to [192.168.10.101:8485, 192.168.10.102:8485, 192.168.10.103:8485] ? (Y or N) Y

15/05/05 16:37:23 INFO namenode.FSImage: Allocated new BlockPoolId: BP-90521690-192.168.10.100-1430815043897

15/05/05 16:37:24 INFO common.Storage: Storage directory /data/dfs/namenode has been successfully formatted.

15/05/05 16:37:26 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

15/05/05 16:37:26 INFO util.ExitUtil: Exiting with status 0

15/05/05 16:37:26 INFO namenode.NameNode: SHUTDOWN_MSG: 

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at master/192.168.10.100

************************************************************/

 

6. active 노드에서 namenode노드를 띄운다.

root@master:/root# sbin/hadoop-daemon.sh start namenode

starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-master.out

(참고 : namenode오류시 https://www.gooper.com/ss/index.php?mid=bigdata&category=2789&page=2&document_srl=3183)


6-1. standby로 사용할 노드의 namenode정보는 "hdfs namenode -bootstrapStandby"를 실행하여 active namenode의 정보를 복사해준다(최초 한번)(7번항목 참조)


6-2. standby 노드에서 namenode를 띄운다.

root@slave:/root# sbin/hadoop-daemon.sh start namenode

starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-master.out

(참고 : namenode오류시 https://www.gooper.com/ss/index.php?mid=bigdata&category=2789&page=2&document_srl=3183)

 

7. standby namenode를 설정한다(standby할 노드(예,node1)에서 실행한다, 최초한번)(6-1번항목에서 진행했으면 불필요함)

root@node1:/data# sbin/hdfs namenode -bootstrapStandby

15/05/05 16:43:51 INFO namenode.NameNode: STARTUP_MSG: 

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = node1/192.168.10.101

STARTUP_MSG:   args = [-bootstrapStandby]

STARTUP_MSG:   version = 2.6.0

STARTUP_MSG:   classpath =...

STARTUP_MSG:   java = 1.7.0_60

************************************************************/

15/05/05 16:43:51 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]

15/05/05 16:43:51 INFO namenode.NameNode: createNameNode [-bootstrapStandby]

15/05/05 16:43:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

=====================================================

About to bootstrap Standby ID nn2 from:

           Nameservice ID: mycluster

        Other Namenode ID: nn1

  Other NN's HTTP address: http://master:50070

  Other NN's IPC  address: master/192.168.10.100:9000

             Namespace ID: 892192946

            Block pool ID: BP-90521690-192.168.10.100-1430815043897

               Cluster ID: CID-4e01a04d-26db-4422-bea6-ba768a96334f

           Layout version: -60

=====================================================

15/05/05 16:43:59 INFO common.Storage: Storage directory /data/dfs/namenode has been successfully formatted.

15/05/05 16:44:02 INFO namenode.TransferFsImage: Opening connection to http://master:50070/imagetransfer?getimage=1&txid=0&storageInfo=-60:892192946:0:CID-4e01a04d-26db-4422-bea6-ba768a96334f

15/05/05 16:44:02 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds

15/05/05 16:44:03 INFO namenode.TransferFsImage: Transfer took 0.09s at 0.00 KB/s

15/05/05 16:44:03 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 351 bytes.

15/05/05 16:44:03 INFO util.ExitUtil: Exiting with status 0

15/05/05 16:44:03 INFO namenode.NameNode: SHUTDOWN_MSG: 

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at node1/192.168.10.101

************************************************************/

 

* namenode, journalnode를 띄우지 않고 hadoop명령을 실행하면 connection오류가 발생한다.

-------------------------------------------------------------------------------------------

15/05/05 16:40:12 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]

15/05/05 16:40:12 INFO namenode.NameNode: createNameNode [-bootstrapStandby]

15/05/05 16:40:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/05/05 16:40:18 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:19 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:20 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:21 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:22 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:23 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:24 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:25 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:26 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:27 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:27 FATAL ha.BootstrapStandby: Unable to fetch namespace information from active NN at master/192.168.10.100:9000: Call From node1/192.168.10.101 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

15/05/05 16:40:27 INFO util.ExitUtil: Exiting with status 2

-------------------------------------------------------------------------------------------

 

8. active및 standby namenode에서 각각 zkfc를 실행한다.(master, node1에서 각각 실행함)

-root@master:/root# sbin/hadoop-daemon.sh start zkfc

starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-master.out

-root@node1:/data# hadoop-daemon.sh start zkfc

starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-node1.out

 

* jps로 확인하면 DFSZKFailoverController가 보인다.

 

9. primary namenode가 active가 아니고 standby일 경우 다음 명령을 주어 active로 전환시킨다.(수동복구 인경우)

root@node1:/data# bin/hdfs haadmin -transitionToActive nn1

 

* 자동복구인 경우 아래와 같은 메세지가 표시되고 명령은 거부됨

Automatic failover is enabled for NameNode at node1/192.168.10.101:9000

Refusing to manually manage HA state, since it may cause

a split-brain scenario or other incorrect state.

If you are very sure you know what you are doing, please 

specify the forcemanual flag.

 

9-1. resourcemanager를 띄운다.(master, node1에서 각각 실행함)

-root@master:/root# sbin/yarn-daemon.sh start resourcemanager

starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-master.out

-root@node1:/data# yarn-daemon.sh start resourcemanager

starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-node1.out

 

10. HDFS(NameNode) HA 상태확인(nn1과 nn2를 확인 해봄)

- root@master:/root# bin/hdfs haadmin -getServiceState nn1

active

- root@master:/root# bin/hdfs haadmin -getServiceState nn2

standby

 

10-1, ResourceManager HA 상태확인 (rm1과 rm2를 확인해봄)

- root@master:/root# bin/yarn rmadmin -getServiceState rm1

standby

- root@master:/root# bin/yarn rmadmin -getServiceState rm2

active

 

11. datanode를 띄운다. (master에서 hadoop-daemons.sh를 실행하면 slaves파일(예,node1~4)에 등록된 서버 모두가 기동됨)

root@master:/root# sbin/hadoop-daemons.sh start datanode

node3: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node3.out

node4: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node4.out

node2: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node2.out

node1: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node1.out

 

12. 내용없음

13. nodemanager를 띄운다.(master에서 yarn-daemons.sh를 실행하면 slaves파일(예,node1~4)에 등록된 서버 모두가 기동됨)

root@master:/root# sbin/yarn-daemons.sh start nodemanager

node4: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node4.out

node1: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node1.out

node3: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node3.out

node2: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node2.out

 

* HDFS HA모드에서는 secondarynamenode를 띄워줄 필요없음

아래는 secondarynamenode를 강제로 띄웠을 때의 메세지를 보여준다.

 

root@node1:/data# hadoop secondarynamenode

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

 

15/05/05 18:45:11 INFO namenode.SecondaryNameNode: STARTUP_MSG: 

/************************************************************

STARTUP_MSG: Starting SecondaryNameNode

STARTUP_MSG:   host = node1/192.168.10.101

STARTUP_MSG:   args = []

STARTUP_MSG:   version = 2.6.0

STARTUP_MSG:   classpath = ...

STARTUP_MSG:   java = 1.7.0_60

************************************************************/

15/05/05 18:45:11 INFO namenode.SecondaryNameNode: registered UNIX signal handlers for [TERM, HUP, INT]

15/05/05 18:45:14 FATAL namenode.SecondaryNameNode: Failed to start secondary namenode

java.io.IOException: Cannot use SecondaryNameNode in an HA cluster. The Standby Namenode will perform checkpointing.

        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:187)

        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671)

15/05/05 18:45:14 INFO util.ExitUtil: Exiting with status 1

15/05/05 18:45:14 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG: 

/************************************************************

SHUTDOWN_MSG: Shutting down SecondaryNameNode at node1/192.168.10.101

************************************************************/

 

14. JobHistoryServer기동(master에서 실행)





      root@master:/root#  sbin/mr-jobhistory-daemon.sh start historyserver


15. jps로 프로세스확인

root@master:/root# jps

7232 QuorumPeerMain

8784 DFSZKFailoverController

13955 Jps

11300 Kafka

11862 HistoryServer

20153 JobHistoryServer

8919 ResourceManager

8281 NameNode

7705 HMaster

11598 Master

 

root@node1:/data# jps

7538 DFSZKFailoverController

6867 NameNode

7319 JournalNode

11800 ResourceManager

15642 Jps

6203 QuorumPeerMain

7726 NodeManager

7038 DataNode

10190 Master

9919 Kafka

13951 HRegionServer

 

root@node2:/data# jps

7353 HRegionServer

5275 DataNode

4764 QuorumPeerMain

7564 HMaster

5614 NodeManager

5951 Kafka

5487 JournalNode

7839 Jps

 

root@node3:/root# jps

4499 DataNode

8790 Jps

4730 NodeManager

8414 HRegionServer

 

root@node4:/data# jps

8516 HRegionServer

7044 NodeManager

8856 Jps

6814 DataNode

 

16 stop-all.sh화면

root@master:/root# stop-all.sh

This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh

Stopping namenodes on [master node1]

master: stopping namenode

node1: stopping namenode

node2: stopping datanode

node3: stopping datanode

node4: stopping datanode

node1: stopping datanode

Stopping journal nodes [node1 node2 node3]

node3: stopping journalnode

node1: stopping journalnode

node2: stopping journalnode

Stopping ZK Failover Controllers on NN hosts [master node1]

node1: stopping zkfc

master: stopping zkfc

stopping yarn daemons

stopping resourcemanager

node2: stopping nodemanager

node1: stopping nodemanager

node4: stopping nodemanager

node3: stopping nodemanager

no proxyserver to stop

 

17. start-all.sh 화면

root@master:/root# start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

Starting namenodes on [master node1]

master: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-namenode-master.out

node1: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-namenode-node1.out

node3: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node3.out

node4: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node4.out

node2: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node2.out

node1: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node1.out

Starting journal nodes [node1 node2 node3]

node2: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-journalnode-node2.out

node3: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-journalnode-node3.out

node1: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-journalnode-node1.out

Starting ZK Failover Controllers on NN hosts [master node1]

master: starting zkfc, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-zkfc-master.out

node1: starting zkfc, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-zkfc-node1.out

starting yarn daemons

starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-master.out

node4: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node4.out

node3: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node3.out

node2: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node2.out

node1: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node1.out


 

18. active및 standby상태를 아래의 url로 확인 가능하다

http://master:50070/

http://node1:50070/  <- 이게 페이지를 찾을 수 없다는 오류가 나는경우는 namenode를 수동으로 띄워준다.

 

http://master:8088/

http://node1:8088/

 

19. mapreduce를 테스트(gooper게정으로 어플리케이션을 실행하는 경우)

- hdfs dfs -mkdir /user  (root게정으로 실행)

- hdfs dfs -mkdir /user/gooper

- hdfs dfs -mkdir /user/gooper/in

- hdfs dfs -put a.txt /user/gooper/in

-hadoop@master:~$ yarn jar $HOME/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount in out

15/05/05 22:53:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/05/05 22:53:41 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2

15/05/05 22:53:44 INFO input.FileInputFormat: Total input paths to process : 1

15/05/05 22:53:45 INFO mapreduce.JobSubmitter: number of splits:1

15/05/05 22:53:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1430823520141_0013

15/05/05 22:53:49 INFO impl.YarnClientImpl: Submitted application application_1430823520141_0013

15/05/05 22:53:49 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1430823520141_0013/

15/05/05 22:53:49 INFO mapreduce.Job: Running job: job_1430823520141_0013

15/05/05 22:55:09 INFO mapreduce.Job: Job job_1430823520141_0013 running in uber mode : false

15/05/05 22:55:09 INFO mapreduce.Job:  map 0% reduce 0%

15/05/05 22:55:33 INFO mapreduce.Job:  map 100% reduce 0%

15/05/05 22:55:58 INFO mapreduce.Job:  map 100% reduce 100%

15/05/05 22:55:59 INFO mapreduce.Job: Job job_1430823520141_0013 completed successfully

15/05/05 22:56:00 INFO mapreduce.Job: Counters: 49

        File System Counters

                FILE: Number of bytes read=75

                FILE: Number of bytes written=217099

                FILE: Number of read operations=0

                FILE: Number of large read operations=0

                FILE: Number of write operations=0

                HDFS: Number of bytes read=148

                HDFS: Number of bytes written=53

                HDFS: Number of read operations=6

                HDFS: Number of large read operations=0

                HDFS: Number of write operations=2

        Job Counters 

                Launched map tasks=1

                Launched reduce tasks=1

                Data-local map tasks=1

                Total time spent by all maps in occupied slots (ms)=40174

                Total time spent by all reduces in occupied slots (ms)=43504

                Total time spent by all map tasks (ms)=20087

                Total time spent by all reduce tasks (ms)=21752

                Total vcore-seconds taken by all map tasks=20087

                Total vcore-seconds taken by all reduce tasks=21752

                Total megabyte-seconds taken by all map tasks=20569088

                Total megabyte-seconds taken by all reduce tasks=22274048

        Map-Reduce Framework

                Map input records=3

                Map output records=4

                Map output bytes=61

                Map output materialized bytes=75

                Input split bytes=102

                Combine input records=4

                Combine output records=4

                Reduce input groups=4

                Reduce shuffle bytes=75

                Reduce input records=4

                Reduce output records=4

                Spilled Records=8

                Shuffled Maps =1

                Failed Shuffles=0

                Merged Map outputs=1

                GC time elapsed (ms)=1684

                CPU time spent (ms)=7030

                Physical memory (bytes) snapshot=220651520

                Virtual memory (bytes) snapshot=715882496

                Total committed heap usage (bytes)=134516736

        Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

        File Input Format Counters 

                Bytes Read=46

        File Output Format Counters 

                Bytes Written=53

번호 제목 날짜 조회 수
27 namenode오류 복구시 사용하는 명령 2016.04.01 584
26 "java.net.NoRouteToHostException: 호스트로 갈 루트가 없음" 오류시 확인및 조치할 사항 2016.04.01 3359
25 CentOS의 서버 5대에 yarn(hadoop 2.7.2)설치하기-ResourceManager HA/HDFS HA포함, JobHistory포함 2016.03.29 1258
24 hortonworks에서 제공하는 메모리 설정값 계산기 사용법 file 2015.06.14 869
23 Error: Could not find or load main class nodemnager 가 발생할때 해결하는 방법 2015.06.05 1226
22 Nodes of the cluster (unhealthy)중 1/1 log-dirs are bad: 오류 해결방법 2015.05.17 1178
21 java.lang.IllegalArgumentException: Does not contain a valid host:port authority: master 오류해결방법 2015.05.06 600
» hadoop 2.6.0 기동(에코시스템 포함)및 wordcount 어플리케이션을 이용한 테스트 2015.05.05 3906
19 Hadoop - 클러스터 세팅및 기동 2015.04.28 941
18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable원인 2015.04.27 1148
17 bananapi 5대(ubuntu계열 리눅스)에 yarn(hadoop 2.6.0)설치하기-ResourceManager HA/HDFS HA포함, JobHistory포함 2015.04.24 19507
16 hadoop의 data디렉토리를 변경하는 방법 2014.08.24 1127
15 access=WRITE, inode="staging":ubuntu:supergroup:rwxr-xr-x 오류 2014.07.05 1846
14 org.apache.hadoop.security.AccessControlException: Permission denied: user=hadoop, access=WRITE, inode="":root:supergroup:rwxr-xr-x 오류 처리방법 2014.07.05 2989
13 banana pi에(lubuntu)에 hadoop설치하고 테스트하기 - 성공 file 2014.07.05 2865
12 hadoop및 ecosystem에서 사용되는 명령문 정리 2014.05.28 3890
11 hadoop설치시 오류 2013.12.18 2731
10 Cacti로 Hadoop 모니터링 하기 file 2013.03.12 2732
9 org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-root/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible. 2013.03.11 14910
8 hadoop설치시 참고사항 2013.03.08 2549
위로