메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


* 해결되지 않는 문제가 발생되면

  ==>rm -r /data/hadoop/dfs/*으로 dfs정보가 저장되는 폴더를 모두 지우고 아래의 수동 기동방법을 따른다...

 주의할점은 기존의 데이타등 모든 정보를 잃게 되므로 주의할것


0. data및 log파일의 경로를 아래와 같이 수정함(내용이 다르게 적혀있으므로 참고해서 봐야함)

 가. data경로 : /data/hadoop/dfs, /data/zookeeper/data ...

 나. log파일 : /logs/hadoop/logs, /logs/zookeeper/logs ...


0-1. 단축실행(start-all.sh을 사용하는 경우)

  가. zookeeper기동(master, node1, node2의 3개 서버에서 각각 실행시켜줌)

        bin/zkServer.sh start

  나. JobHistoryServer기동(hadoop master에서 실행)







        sbin/mr-jobhistory-daemon.sh start historyserver

  다. hbase기동 (hbase master가 설치된 노드에서 실행함)

        bin/start-hbase.sh

        bin/hbase-daemon.sh start master (secondary master 노드에서 실행)

  라. hive(설치된 서버에서 실행)

     - hive server시작(hive가 설치된 master에서 실행)

        :nohup hive server2 &

     - hive metastore서버 시작(hive가 설치된 master에서 실행)

        :nohup hive --service metastore &

  마. hadoop 실행(master서버에서만 실행)

      - hdfs구동 : sbin/start-dfs.sh

      - yarn구동 : sbin/start-yarn.sh

      * standby resourcemanager가 기동이 안될때 : sbin/yarn-daemon.sh start resourcemanager

  바. oozied.sh start (oozie가 설치된 노드에서 실행함)

  사. spark 실행

     - master기동(active, standby에서 각각 실행) : sbin/start-master.sh

     - Worker기동(active에서 실행) : sbin/start-slaves.sh

     - history서버 기동(active에서 실행) : sbin/start-history-server.sh

  아. kafka 실행(broker서버 각각에서 실행)

     - bin/kafka-server-start.sh config/server-1.properties &

     - bin/kafka-server-start.sh config/server-2.properties &

     - bin/kafka-server-start.sh config/server-3.properties &

     

 

---아래는 start-all.sh을 사용하지 않고 각각을 실행하는 경우이다(수동으로 기동).----------

1. 모든 데몬이 내려간 상태에서 HA관련 설정을 마무리하고 적용하는 경우를 가정한다.

 

2. zookeeper기동(master, node1, node2의 3개 서버에서 각각 실행시켜줌)

  bin/zkServer.sh start

 

3. zookeeper에 HA를 위한 znode를 추가한다(namenode중 하나의 노드에서 실행하면 됨, 최초한번

root@master:/root# sbin/hdfs zkfc -formatZK

15/05/05 16:12:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/05/05 16:12:34 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at master/192.168.10.100:9000

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:host.name=master

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_60

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/local/jdk1.7.0_60/jre

15/05/05 16:12:35 INFO zookeeper.ZooKeeper: Client environment:java.class.path=...

15/05/05 16:12:35 INFO ha.ActiveStandbyElector: Session connected.

===============================================

The configured parent znode /hadoop-ha/mycluster already exists.

Are you sure you want to clear all failover information from

ZooKeeper?

WARNING: Before proceeding, ensure that all HDFS services and

failover controllers are stopped!

===============================================

Proceed formatting /hadoop-ha/mycluster? (Y or N) Y

15/05/05 16:12:56 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/mycluster from ZK...

15/05/05 16:12:57 INFO ha.ActiveStandbyElector: Successfully deleted /hadoop-ha/mycluster from ZK.

15/05/05 16:12:57 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.

15/05/05 16:12:57 INFO zookeeper.ClientCnxn: EventThread shut down

15/05/05 16:12:57 INFO zookeeper.ZooKeeper: Session: 0x24d2320df3b0000 closed

 

4. QJM로 사용할 서버마다 JournalNode를 실행한다.(예, node1, node2, node3)

root@master:/root# sbin/hadoop-daemon.sh start journalnode

starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-master.out

*각각 실행시켜주지 않으면 namenode format할때 journalnode에 접속하지 못해서 아래와 같은 오류가 발생함

(아래는 node3에서 journalnode기동되지 않은 경우임)

--------------------------------------------------------------------------------------

rg.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 2 exceptions thrown:

192.168.10.101:8485: Call From master/192.168.10.100 to node1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

192.168.10.102:8485: Call From master/192.168.10.100 to node2:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

        at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)

        at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)

        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)

        at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:884)

        at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:171)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:937)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1379)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)

15/05/05 16:18:04 FATAL namenode.NameNode: Failed to start namenode.

org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 2 exceptions thrown:

192.168.10.101:8485: Call From master/192.168.10.100 to node1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

192.168.10.102:8485: Call From master/192.168.10.100 to node2:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

        at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)

        at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)

        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232)

        at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:884)

        at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:171)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:937)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1379)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)

15/05/05 16:18:04 INFO util.ExitUtil: Exiting with status 1

15/05/05 16:18:04 INFO namenode.NameNode: SHUTDOWN_MSG: 

-------------------------------------------------------------------------------------------

 

5. active로 사용할 namenode에서 "hdfs namenode -format"을 수행한다.(최초 한번)

root@master:/root# sbin/hdfs namenode -format

15/05/05 16:36:22 INFO namenode.NameNode: STARTUP_MSG: 

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = master/192.168.10.100

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 2.6.0

STARTUP_MSG:   classpath =...

STARTUP_MSG:   java = 1.7.0_60

************************************************************/

15/05/05 16:36:22 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]

15/05/05 16:36:22 INFO namenode.NameNode: createNameNode [-format]

15/05/05 16:36:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Formatting using clusterid: CID-4e01a04d-26db-4422-bea6-ba768a96334f

15/05/05 16:36:27 INFO namenode.FSNamesystem: No KeyProvider found.

15/05/05 16:36:27 INFO namenode.FSNamesystem: fsLock is fair:true

15/05/05 16:36:27 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000

15/05/05 16:36:27 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true

15/05/05 16:36:27 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000

15/05/05 16:36:27 INFO blockmanagement.BlockManager: The block deletion will start around 2015 May 05 16:36:27

15/05/05 16:36:27 INFO util.GSet: Computing capacity for map BlocksMap

15/05/05 16:36:27 INFO util.GSet: VM type       = 32-bit

15/05/05 16:36:27 INFO util.GSet: 2.0% max memory 966.8 MB = 19.3 MB

15/05/05 16:36:27 INFO util.GSet: capacity      = 2^22 = 4194304 entries

15/05/05 16:36:28 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false

15/05/05 16:36:28 INFO blockmanagement.BlockManager: defaultReplication         = 3

15/05/05 16:36:28 INFO blockmanagement.BlockManager: maxReplication             = 512

15/05/05 16:36:28 INFO blockmanagement.BlockManager: minReplication             = 1

15/05/05 16:36:28 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2

15/05/05 16:36:28 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false

15/05/05 16:36:28 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000

15/05/05 16:36:28 INFO blockmanagement.BlockManager: encryptDataTransfer        = false

15/05/05 16:36:28 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000

15/05/05 16:36:28 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)

15/05/05 16:36:28 INFO namenode.FSNamesystem: supergroup          = supergroup

15/05/05 16:36:28 INFO namenode.FSNamesystem: isPermissionEnabled = false

15/05/05 16:36:28 INFO namenode.FSNamesystem: Determined nameservice ID: mycluster

15/05/05 16:36:28 INFO namenode.FSNamesystem: HA Enabled: true

15/05/05 16:36:28 INFO namenode.FSNamesystem: Append Enabled: true

15/05/05 16:36:29 INFO util.GSet: Computing capacity for map INodeMap

15/05/05 16:36:29 INFO util.GSet: VM type       = 32-bit

15/05/05 16:36:29 INFO util.GSet: 1.0% max memory 966.8 MB = 9.7 MB

15/05/05 16:36:29 INFO util.GSet: capacity      = 2^21 = 2097152 entries

15/05/05 16:36:29 INFO namenode.NameNode: Caching file names occuring more than 10 times

15/05/05 16:36:29 INFO util.GSet: Computing capacity for map cachedBlocks

15/05/05 16:36:29 INFO util.GSet: VM type       = 32-bit

15/05/05 16:36:29 INFO util.GSet: 0.25% max memory 966.8 MB = 2.4 MB

15/05/05 16:36:29 INFO util.GSet: capacity      = 2^19 = 524288 entries

15/05/05 16:36:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033

15/05/05 16:36:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0

15/05/05 16:36:29 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000

15/05/05 16:36:29 INFO namenode.FSNamesystem: Retry cache on namenode is enabled

15/05/05 16:36:29 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis

15/05/05 16:36:29 INFO util.GSet: Computing capacity for map NameNodeRetryCache

15/05/05 16:36:29 INFO util.GSet: VM type       = 32-bit

15/05/05 16:36:29 INFO util.GSet: 0.029999999329447746% max memory 966.8 MB = 297.0 KB

15/05/05 16:36:29 INFO util.GSet: capacity      = 2^16 = 65536 entries

15/05/05 16:36:29 INFO namenode.NNConf: ACLs enabled? false

15/05/05 16:36:29 INFO namenode.NNConf: XAttrs enabled? true

15/05/05 16:36:29 INFO namenode.NNConf: Maximum size of an xattr: 16384

Re-format filesystem in Storage Directory /data/dfs/namenode ? (Y or N) Y

Re-format filesystem in QJM to [192.168.10.101:8485, 192.168.10.102:8485, 192.168.10.103:8485] ? (Y or N) Y

15/05/05 16:37:23 INFO namenode.FSImage: Allocated new BlockPoolId: BP-90521690-192.168.10.100-1430815043897

15/05/05 16:37:24 INFO common.Storage: Storage directory /data/dfs/namenode has been successfully formatted.

15/05/05 16:37:26 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

15/05/05 16:37:26 INFO util.ExitUtil: Exiting with status 0

15/05/05 16:37:26 INFO namenode.NameNode: SHUTDOWN_MSG: 

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at master/192.168.10.100

************************************************************/

 

6. active 노드에서 namenode노드를 띄운다.

root@master:/root# sbin/hadoop-daemon.sh start namenode

starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-master.out

(참고 : namenode오류시 https://www.gooper.com/ss/index.php?mid=bigdata&category=2789&page=2&document_srl=3183)


6-1. standby로 사용할 노드의 namenode정보는 "hdfs namenode -bootstrapStandby"를 실행하여 active namenode의 정보를 복사해준다(최초 한번)(7번항목 참조)


6-2. standby 노드에서 namenode를 띄운다.

root@slave:/root# sbin/hadoop-daemon.sh start namenode

starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-master.out

(참고 : namenode오류시 https://www.gooper.com/ss/index.php?mid=bigdata&category=2789&page=2&document_srl=3183)

 

7. standby namenode를 설정한다(standby할 노드(예,node1)에서 실행한다, 최초한번)(6-1번항목에서 진행했으면 불필요함)

root@node1:/data# sbin/hdfs namenode -bootstrapStandby

15/05/05 16:43:51 INFO namenode.NameNode: STARTUP_MSG: 

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = node1/192.168.10.101

STARTUP_MSG:   args = [-bootstrapStandby]

STARTUP_MSG:   version = 2.6.0

STARTUP_MSG:   classpath =...

STARTUP_MSG:   java = 1.7.0_60

************************************************************/

15/05/05 16:43:51 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]

15/05/05 16:43:51 INFO namenode.NameNode: createNameNode [-bootstrapStandby]

15/05/05 16:43:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

=====================================================

About to bootstrap Standby ID nn2 from:

           Nameservice ID: mycluster

        Other Namenode ID: nn1

  Other NN's HTTP address: http://master:50070

  Other NN's IPC  address: master/192.168.10.100:9000

             Namespace ID: 892192946

            Block pool ID: BP-90521690-192.168.10.100-1430815043897

               Cluster ID: CID-4e01a04d-26db-4422-bea6-ba768a96334f

           Layout version: -60

=====================================================

15/05/05 16:43:59 INFO common.Storage: Storage directory /data/dfs/namenode has been successfully formatted.

15/05/05 16:44:02 INFO namenode.TransferFsImage: Opening connection to http://master:50070/imagetransfer?getimage=1&txid=0&storageInfo=-60:892192946:0:CID-4e01a04d-26db-4422-bea6-ba768a96334f

15/05/05 16:44:02 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds

15/05/05 16:44:03 INFO namenode.TransferFsImage: Transfer took 0.09s at 0.00 KB/s

15/05/05 16:44:03 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 351 bytes.

15/05/05 16:44:03 INFO util.ExitUtil: Exiting with status 0

15/05/05 16:44:03 INFO namenode.NameNode: SHUTDOWN_MSG: 

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at node1/192.168.10.101

************************************************************/

 

* namenode, journalnode를 띄우지 않고 hadoop명령을 실행하면 connection오류가 발생한다.

-------------------------------------------------------------------------------------------

15/05/05 16:40:12 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]

15/05/05 16:40:12 INFO namenode.NameNode: createNameNode [-bootstrapStandby]

15/05/05 16:40:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/05/05 16:40:18 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:19 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:20 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:21 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:22 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:23 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:24 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:25 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:26 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:27 INFO ipc.Client: Retrying connect to server: master/192.168.10.100:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

15/05/05 16:40:27 FATAL ha.BootstrapStandby: Unable to fetch namespace information from active NN at master/192.168.10.100:9000: Call From node1/192.168.10.101 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

15/05/05 16:40:27 INFO util.ExitUtil: Exiting with status 2

-------------------------------------------------------------------------------------------

 

8. active및 standby namenode에서 각각 zkfc를 실행한다.(master, node1에서 각각 실행함)

-root@master:/root# sbin/hadoop-daemon.sh start zkfc

starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-master.out

-root@node1:/data# hadoop-daemon.sh start zkfc

starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-node1.out

 

* jps로 확인하면 DFSZKFailoverController가 보인다.

 

9. primary namenode가 active가 아니고 standby일 경우 다음 명령을 주어 active로 전환시킨다.(수동복구 인경우)

root@node1:/data# bin/hdfs haadmin -transitionToActive nn1

 

* 자동복구인 경우 아래와 같은 메세지가 표시되고 명령은 거부됨

Automatic failover is enabled for NameNode at node1/192.168.10.101:9000

Refusing to manually manage HA state, since it may cause

a split-brain scenario or other incorrect state.

If you are very sure you know what you are doing, please 

specify the forcemanual flag.

 

9-1. resourcemanager를 띄운다.(master, node1에서 각각 실행함)

-root@master:/root# sbin/yarn-daemon.sh start resourcemanager

starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-master.out

-root@node1:/data# yarn-daemon.sh start resourcemanager

starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-node1.out

 

10. HDFS(NameNode) HA 상태확인(nn1과 nn2를 확인 해봄)

- root@master:/root# bin/hdfs haadmin -getServiceState nn1

active

- root@master:/root# bin/hdfs haadmin -getServiceState nn2

standby

 

10-1, ResourceManager HA 상태확인 (rm1과 rm2를 확인해봄)

- root@master:/root# bin/yarn rmadmin -getServiceState rm1

standby

- root@master:/root# bin/yarn rmadmin -getServiceState rm2

active

 

11. datanode를 띄운다. (master에서 hadoop-daemons.sh를 실행하면 slaves파일(예,node1~4)에 등록된 서버 모두가 기동됨)

root@master:/root# sbin/hadoop-daemons.sh start datanode

node3: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node3.out

node4: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node4.out

node2: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node2.out

node1: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node1.out

 

12. 내용없음

13. nodemanager를 띄운다.(master에서 yarn-daemons.sh를 실행하면 slaves파일(예,node1~4)에 등록된 서버 모두가 기동됨)

root@master:/root# sbin/yarn-daemons.sh start nodemanager

node4: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node4.out

node1: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node1.out

node3: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node3.out

node2: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node2.out

 

* HDFS HA모드에서는 secondarynamenode를 띄워줄 필요없음

아래는 secondarynamenode를 강제로 띄웠을 때의 메세지를 보여준다.

 

root@node1:/data# hadoop secondarynamenode

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

 

15/05/05 18:45:11 INFO namenode.SecondaryNameNode: STARTUP_MSG: 

/************************************************************

STARTUP_MSG: Starting SecondaryNameNode

STARTUP_MSG:   host = node1/192.168.10.101

STARTUP_MSG:   args = []

STARTUP_MSG:   version = 2.6.0

STARTUP_MSG:   classpath = ...

STARTUP_MSG:   java = 1.7.0_60

************************************************************/

15/05/05 18:45:11 INFO namenode.SecondaryNameNode: registered UNIX signal handlers for [TERM, HUP, INT]

15/05/05 18:45:14 FATAL namenode.SecondaryNameNode: Failed to start secondary namenode

java.io.IOException: Cannot use SecondaryNameNode in an HA cluster. The Standby Namenode will perform checkpointing.

        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:187)

        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671)

15/05/05 18:45:14 INFO util.ExitUtil: Exiting with status 1

15/05/05 18:45:14 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG: 

/************************************************************

SHUTDOWN_MSG: Shutting down SecondaryNameNode at node1/192.168.10.101

************************************************************/

 

14. JobHistoryServer기동(master에서 실행)





      root@master:/root#  sbin/mr-jobhistory-daemon.sh start historyserver


15. jps로 프로세스확인

root@master:/root# jps

7232 QuorumPeerMain

8784 DFSZKFailoverController

13955 Jps

11300 Kafka

11862 HistoryServer

20153 JobHistoryServer

8919 ResourceManager

8281 NameNode

7705 HMaster

11598 Master

 

root@node1:/data# jps

7538 DFSZKFailoverController

6867 NameNode

7319 JournalNode

11800 ResourceManager

15642 Jps

6203 QuorumPeerMain

7726 NodeManager

7038 DataNode

10190 Master

9919 Kafka

13951 HRegionServer

 

root@node2:/data# jps

7353 HRegionServer

5275 DataNode

4764 QuorumPeerMain

7564 HMaster

5614 NodeManager

5951 Kafka

5487 JournalNode

7839 Jps

 

root@node3:/root# jps

4499 DataNode

8790 Jps

4730 NodeManager

8414 HRegionServer

 

root@node4:/data# jps

8516 HRegionServer

7044 NodeManager

8856 Jps

6814 DataNode

 

16 stop-all.sh화면

root@master:/root# stop-all.sh

This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh

Stopping namenodes on [master node1]

master: stopping namenode

node1: stopping namenode

node2: stopping datanode

node3: stopping datanode

node4: stopping datanode

node1: stopping datanode

Stopping journal nodes [node1 node2 node3]

node3: stopping journalnode

node1: stopping journalnode

node2: stopping journalnode

Stopping ZK Failover Controllers on NN hosts [master node1]

node1: stopping zkfc

master: stopping zkfc

stopping yarn daemons

stopping resourcemanager

node2: stopping nodemanager

node1: stopping nodemanager

node4: stopping nodemanager

node3: stopping nodemanager

no proxyserver to stop

 

17. start-all.sh 화면

root@master:/root# start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

Starting namenodes on [master node1]

master: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-namenode-master.out

node1: starting namenode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-namenode-node1.out

node3: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node3.out

node4: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node4.out

node2: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node2.out

node1: starting datanode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-datanode-node1.out

Starting journal nodes [node1 node2 node3]

node2: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-journalnode-node2.out

node3: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-journalnode-node3.out

node1: starting journalnode, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-journalnode-node1.out

Starting ZK Failover Controllers on NN hosts [master node1]

master: starting zkfc, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-zkfc-master.out

node1: starting zkfc, logging to /usr/local/hadoop-2.6.0/logs/hadoop-root-zkfc-node1.out

starting yarn daemons

starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-master.out

node4: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node4.out

node3: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node3.out

node2: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node2.out

node1: starting nodemanager, logging to /usr/local/hadoop-2.6.0/logs/yarn-root-nodemanager-node1.out


 

18. active및 standby상태를 아래의 url로 확인 가능하다

http://master:50070/

http://node1:50070/  <- 이게 페이지를 찾을 수 없다는 오류가 나는경우는 namenode를 수동으로 띄워준다.

 

http://master:8088/

http://node1:8088/

 

19. mapreduce를 테스트(gooper게정으로 어플리케이션을 실행하는 경우)

- hdfs dfs -mkdir /user  (root게정으로 실행)

- hdfs dfs -mkdir /user/gooper

- hdfs dfs -mkdir /user/gooper/in

- hdfs dfs -put a.txt /user/gooper/in

-hadoop@master:~$ yarn jar $HOME/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount in out

15/05/05 22:53:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

15/05/05 22:53:41 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2

15/05/05 22:53:44 INFO input.FileInputFormat: Total input paths to process : 1

15/05/05 22:53:45 INFO mapreduce.JobSubmitter: number of splits:1

15/05/05 22:53:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1430823520141_0013

15/05/05 22:53:49 INFO impl.YarnClientImpl: Submitted application application_1430823520141_0013

15/05/05 22:53:49 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1430823520141_0013/

15/05/05 22:53:49 INFO mapreduce.Job: Running job: job_1430823520141_0013

15/05/05 22:55:09 INFO mapreduce.Job: Job job_1430823520141_0013 running in uber mode : false

15/05/05 22:55:09 INFO mapreduce.Job:  map 0% reduce 0%

15/05/05 22:55:33 INFO mapreduce.Job:  map 100% reduce 0%

15/05/05 22:55:58 INFO mapreduce.Job:  map 100% reduce 100%

15/05/05 22:55:59 INFO mapreduce.Job: Job job_1430823520141_0013 completed successfully

15/05/05 22:56:00 INFO mapreduce.Job: Counters: 49

        File System Counters

                FILE: Number of bytes read=75

                FILE: Number of bytes written=217099

                FILE: Number of read operations=0

                FILE: Number of large read operations=0

                FILE: Number of write operations=0

                HDFS: Number of bytes read=148

                HDFS: Number of bytes written=53

                HDFS: Number of read operations=6

                HDFS: Number of large read operations=0

                HDFS: Number of write operations=2

        Job Counters 

                Launched map tasks=1

                Launched reduce tasks=1

                Data-local map tasks=1

                Total time spent by all maps in occupied slots (ms)=40174

                Total time spent by all reduces in occupied slots (ms)=43504

                Total time spent by all map tasks (ms)=20087

                Total time spent by all reduce tasks (ms)=21752

                Total vcore-seconds taken by all map tasks=20087

                Total vcore-seconds taken by all reduce tasks=21752

                Total megabyte-seconds taken by all map tasks=20569088

                Total megabyte-seconds taken by all reduce tasks=22274048

        Map-Reduce Framework

                Map input records=3

                Map output records=4

                Map output bytes=61

                Map output materialized bytes=75

                Input split bytes=102

                Combine input records=4

                Combine output records=4

                Reduce input groups=4

                Reduce shuffle bytes=75

                Reduce input records=4

                Reduce output records=4

                Spilled Records=8

                Shuffled Maps =1

                Failed Shuffles=0

                Merged Map outputs=1

                GC time elapsed (ms)=1684

                CPU time spent (ms)=7030

                Physical memory (bytes) snapshot=220651520

                Virtual memory (bytes) snapshot=715882496

                Total committed heap usage (bytes)=134516736

        Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

        File Input Format Counters 

                Bytes Read=46

        File Output Format Counters 

                Bytes Written=53

번호 제목 날짜 조회 수
124 Nodes of the cluster (unhealthy)중 1/1 log-dirs are bad: 오류 해결방법 2015.05.17 1303
123 secureCRT에서 backspace키가 작동하지 않는 경우 해결방법 2015.05.11 1584
122 hbase가 기동시키는 zookeeper에서 받아드리는 ip가 IPv6로 사용되는 경우가 있는데 이를 IPv4로 강제적용하는 방법 2015.05.08 1262
121 hbase CustomFilter만들기 (0.98.X이상) 2015.05.08 1126
120 znode /hbase recursive하게 지우기 2015.05.06 1369
119 java.lang.IllegalArgumentException: Does not contain a valid host:port authority: master 오류해결방법 2015.05.06 616
» hadoop 2.6.0 기동(에코시스템 포함)및 wordcount 어플리케이션을 이용한 테스트 2015.05.05 4080
117 oozie 4.1 설치 - maven을 이용한 source compile on hadoop 2.5.2 with postgresql 9.3 2015.04.30 1370
116 hive 0.13.1 설치 + meta정보는 postgresql 9.3에 저장 2015.04.30 1027
115 HBase 0.98.12(1.2.5) for hadoop2 설치-5대에 완전분산모드 (HDFS HA상테) 2015.04.29 1487
114 Hadoop - 클러스터 세팅및 기동 2015.04.28 942
113 zookeeper 3.4.6 설치(3대) 2015.04.28 1752
112 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable원인 2015.04.27 1169
111 bananapi 5대(ubuntu계열 리눅스)에 yarn(hadoop 2.6.0)설치하기-ResourceManager HA/HDFS HA포함, JobHistory포함 2015.04.24 19612
110 scan의 startrow, stoprow지정하는 방법 2015.04.08 1224
109 SASL configuration failed: javax.security.auth.login.LoginException: java.lang.NullPointerException 오류 해결방법 2015.04.02 1079
108 kafka의 re-balance를 이용하여 consumer를 multi thread로 돌려서 topic의 partitions을 활용 2015.03.31 1505
107 Using The ZooKeeper CLI에서 zkCli의 위치 2014.11.02 1580
106 [번역] solr 검색 엔진 튜토리얼 2014.10.07 765
105 solr vs elasticsearch 비교2 2014.09.29 1540
위로