메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


123,534,991건의 data를 hive를 통하여 hbase에 입력하는 중..

거의 5시간이 지나서.. 아래의 오류메세지가 발생했다.

직전에 disk full이 발생해서 일부 정리하고 하여 계속 진행중이 었는데.. 이게 문제를 일으켰나??

---->

disk full동안.. zookeeper와의 session이 timeout되었고.. zookeeper가 해당 node를 삭제했는데..

HMaster는 이전 session으로 zookeeper에게 요청햇으나.. 해당 znode값이 없어서.. down 되고..

다시 HRegionServer가 down되고..하는 연쇄 반응이 발생하여. 작업이 실패함..

start-hbase.sh을 실행하면.. HMaster는 잠시 올라왔다가 내려가는 문제가 있는데..이것은 HMaster가 Zookeeper에 connect를 못해서

znode를 생성하지 못해서 발생하는 문제임.

(hbase zkcli실행후 ls /hbase/table하면 아무것도 없음.. 여기에 table명이 들어 가야 하는데...)


--> 결과적으로 hbase가 깨진거 같은데.. 이럴경우 아래의 명령을 수행하면 복구가 된다.

(hbase hbck -fixMeta -fixAssignments 혹은 hbase hbck -repair)

-- 명령후 오류가  발생하면서 제대로 수행이 안되는 경우가 있는데.. 이때는 regionserver를 죽였다가 다시 살려준다.

(hbase-daemon.sh stop regionserver 이후 hbase-daemon.sh start regionserver)

 

다시정리하면....

 

1. /etc/hosts는 127.0.1.1 부분을 주석처리하고 실제 ip를 입력한다.

127.0.0.1       localhost
#127.0.1.1      bigdata-host
#127.0.0.1      bigdata-host
192.168.8.5     bigdata-host

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters


2. hbase/conf/regionservers파일에는 bigdata-host를 기록한다.

3. regionserver를 재시작한다.

  가. hbase-daemon.sh stop regionserver

  나.hbase-daemonsh start regionserver

4. hbase hbck -fixMeta -fixAssignments를 실행한다.

(실행후 prompt로 빠져나오지 않거나 끝부분에 Status: OK로 표시되지 않으면.. 4.를 반복 실행한다.)

(HMaster와 HRegionServer는 jps로 확인이 되어야 한다.)
 

-------------------------------------jps------------------------------------

hadoop@bigdata-host:~/hbase/logs$ jps
12180 Jps
565 DataNode
1849 HQuorumPeer
907 JobTracker
819 SecondaryNameNode
11834 RunJar
1276 TaskTracker
29547 Main
300 NameNode

------------------------------------------------------hbase shell 에서  로그------------------------------------------------------------------------

.......

2014-04-29 13:03:40,699 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:41,795 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:42,826 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:43,852 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:44,867 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:45,876 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:46,949 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:48,033 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:03:49,046 Stage-0 map = 13%,  reduce = 0%, Cumulative CPU 1089.83 sec
2014-04-29 13:33:38,482 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:39,781 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:41,036 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:42,160 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:43,166 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:44,188 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:45,229 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:46,320 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:47,372 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:33:48,422 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
.........

2014-04-29 13:40:05,777 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 911.95 sec
2014-04-29 13:40:06,785 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:40:07,803 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:40:08,809 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:40:09,822 Stage-0 map = 11%,  reduce = 0%, Cumulative CPU 906.53 sec
2014-04-29 13:40:10,841 Stage-0 map = 100%,  reduce = 100%, Cumulative CPU 906.53 sec
MapReduce Total cumulative CPU time: 15 minutes 6 seconds 530 msec
Ended Job = job_201404290915_0002 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201404290915_0002
Examining task ID: task_201404290915_0002_m_000045 (and more) from job job_201404290915_0002
Examining task ID: task_201404290915_0002_m_000006 (and more) from job job_201404290915_0002

Task with the most failures(4):
-----
Task ID:
  task_201404290915_0002_m_000006

URL:
  http://localhost:50030/taskdetails.jsp?jobid=job_201404290915_0002&tipid=task_201404290915_0002_m_000006
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"year":"1992","month":"10","dayofmonth":"3","dayofweek":"6","deptime":"1859","crsdeptime":"1859","arrtime":"2027","crsarrtime":"2034","uniquecarrier":"AA","flightnum":"701","tailnum":"NA","actualelapsedtime":"148","crselapsedtime":"155","airtime":"NA","arrdelay":"-7","depdelay":"0","origin":"CMH","dest":"DFW","distance":"927","taxiin":"NA","taxiout":"NA","cancelled":"0","cancellationcode":"NA","diverted":"0","carrierdelay":"NA","weatherdelay":"NA","nasdelay":"NA","securitydelay":"NA","lateaircraftdelay":"NA"}
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"year":"1992","month":"10","dayofmonth":"3","dayofweek":"6","deptime":"1859","crsdeptime":"1859","arrtime":"2027","crsarrtime":"2034","uniquecarrier":"AA","flightnum":"701","tailnum":"NA","actualelapsedtime":"148","crselapsedtime":"155","airtime":"NA","arrdelay":"-7","depdelay":"0","origin":"CMH","dest":"DFW","distance":"927","taxiin":"NA","taxiout":"NA","cancelled":"0","cancellationcode":"NA","diverted":"0","carrierdelay":"NA","weatherdelay":"NA","nasdelay":"NA","securitydelay":"NA","lateaircraftdelay":"NA"}
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for h_airline,,99999999999999 after 10 tries.
 at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:240)
 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:515)
 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:571)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
 at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
 at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:652)
 ... 9 more
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for h_airline,,99999999999999 after 10 tries.
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:980)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:885)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:889)
 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:846)
 at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:234)
 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:174)
 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:133)
 at org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat.getHiveRecordWriter(HiveHBaseTableOutputFormat.java:82)
 at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:250)
 at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237)
 ... 20 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 44   Cumulative CPU: 906.53 sec   HDFS Read: 1509838444 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 15 minutes 6 seconds 530 msec

 

-------------------------------------------hbase-hadoop-master-bigdata-host.log--------------------------------------------------------------

2014-04-29 12:56:57,247 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 10 catalog row(s) and gc'd 0 unreferenced parent region(s)
2014-04-29 13:01:57,216 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62931a92
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=2 average=2.0 mostloaded=2 leastloaded=2
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=4 average=4.0 mostloaded=4 leastloaded=4
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1
2014-04-29 13:01:57,222 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing because balanced cluster; servers=1 regions=1 average=1.0 mostloaded=1 leastloaded=1
2014-04-29 13:01:57,284 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 10 catalog row(s) and gc'd 0 unreferenced parent region(s)
2014-04-29 13:33:36,780 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1791593ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,789 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1795938ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,781 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1787840ms instead of 1000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,819 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1899402ms instead of 300000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,781 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1899462ms instead of 300000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,781 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1796872ms instead of 60000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,781 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 1787532ms instead of 1000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2014-04-29 13:33:36,919 DEBUG org.apache.hadoop.hbase.client.MetaScanner: Scanning .META. starting at row= for max=2147483647 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62931a92
2014-04-29 13:33:36,986 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 1827950ms for sessionid 0x145aad6efba0002, closing socket connection and attempting reconnect
.........

2014-04-29 13:33:39,220 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
2014-04-29 13:33:40,024 ERROR org.apache.hadoop.hbase.master.HMaster: Region server ^@^@bigdata-host,60020,1398730589059 reported a fatal error:
ABORTING region server bigdata-host,60020,1398730589059: regionserver:60020-0x145aad6efba0001 regionserver:60020-0x145aad6efba0001 received expired from ZooKeeper, aborting
Cause:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

2014-04-29 13:33:40,406 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x145aad6efba0002 has expired, closing socket connection
2014-04-29 13:33:40,412 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.
2014-04-29 13:33:40,589 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZK session expired. This disconnect could have been caused by a network partition or a long-running GC pause, either way it's recommended that you verify your environment.
2014-04-29 13:33:40,505 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x145aad6efba0000 has expired, closing socket connection
2014-04-29 13:33:40,642 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2014-04-29 13:33:40,658 INFO org.apache.hadoop.hbase.master.HMaster: Primary Master trying to recover from ZooKeeper session expiry.
2014-04-29 13:33:40,669 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Closing dead ZooKeeper connection, session was: 0x145aad6efba0000
2014-04-29 13:33:40,683 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2014-04-29 13:33:40,707 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=master:60000-0x145aad6efba0000
2014-04-29 13:33:41,119 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Recreated a ZooKeeper, session is: 0x0
2014-04-29 13:33:41,126 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-04-29 13:33:41,136 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
2014-04-29 13:33:41,186 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x145aad6efba0013, negotiated timeout = 180000
2014-04-29 13:33:41,386 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/bigdata-host,60000,1398730591004 from backup master directory
2014-04-29 13:33:41,418 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/backup-masters/bigdata-host,60000,1398730591004 already deleted, and this is not a retry
2014-04-29 13:33:41,419 INFO org.apache.hadoop.hbase.master.ActiveMasterManager: Master=bigdata-host,60000,1398730591004
2014-04-29 13:33:41,428 INFO org.apache.hadoop.hbase.master.SplitLogManager: timeout = 300000
2014-04-29 13:33:41,433 INFO org.apache.hadoop.hbase.master.SplitLogManager: unassigned timeout = 180000
2014-04-29 13:33:41,433 INFO org.apache.hadoop.hbase.master.SplitLogManager: resubmit threshold = 3
2014-04-29 13:33:41,444 INFO org.apache.hadoop.hbase.master.SplitLogManager: found 0 orphan tasks and 0 rescan nodes
2014-04-29 13:33:42,258 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Starting catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@3242e74f
2014-04-29 13:33:42,402 INFO org.apache.hadoop.hbase.master.HMaster: Server active/primary master; bigdata-host,60000,1398730591004, sessionid=0x145aad6efba0013, cluster-up flag was=true
2014-04-29 13:33:42,432 INFO org.apache.hadoop.hbase.master.snapshot.SnapshotManager: Snapshot feature is not enabled, missing log and hfile cleaners.
......................

2014-04-29 13:38:40,922 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: locateRegionInMeta parentTable=-ROOT-, metaLocation={region=-ROOT-,,0.70236052, hostname=bigdata-host, port=60020}, attempt=3 of 140 failed; retrying after sleep of 2008 because: Connection refused
2014-04-29 13:38:40,925 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Looked up root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@62931a92; serverName=bigdata-host,60020,1398730589059
2014-04-29 13:38:41,293 WARN org.apache.hadoop.hbase.master.SplitLogManager: Expected at least4 tasks in ZK, but actually there are 0
2014-04-29 13:38:41,294 WARN org.apache.hadoop.hbase.master.SplitLogManager: No more task remaining (ZK or task map), splitting should have completed. Remaining tasks in ZK 0, active tasks in map 4
2014-04-29 13:38:41,294 WARN org.apache.hadoop.hbase.master.SplitLogManager: Interrupted while waiting for log splits to be completed
2014-04-29 13:38:41,294 WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs in [hdfs://localhost:9000/hbase/.logs/bigdata-host,60020,1398730589059-splitting] installed = 4 but only 0 done
2014-04-29 13:38:41,319 FATAL org.apache.hadoop.hbase.master.HMaster: master:60000-0x145aad6efba0000 master:60000-0x145aad6efba0000 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:384)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:303)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2014-04-29 13:38:41,346 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2014-04-29 13:38:41,346 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads
2014-04-29 13:38:41,347 INFO org.apache.hadoop.hbase.master.HMaster$2: bigdata-host,60000,1398730591004-BalancerChore exiting
2014-04-29 13:38:41,347 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60000
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60000: exiting
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60000: exiting
2014-04-29 13:38:41,347 WARN org.apache.hadoop.hbase.master.CatalogJanitor: Failed scan of catalog table
java.io.IOException: Giving up after tries=1
        at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:210)
        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:188)
        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:82)
        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:67)
        at org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:126)
        at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:137)
        at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:93)
        at org.apache.hadoop.hbase.Chore.run(Chore.java:67)
        at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.InterruptedException: sleep interrupted
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:207)
        ... 8 more
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60000
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60000: exiting
................................

2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60000
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60000: exiting
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60000: exiting
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60000: exiting
2014-04-29 13:38:41,349 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60000: exiting
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60000: exiting
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60000: exiting
2014-04-29 13:38:41,352 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 1 on 60000: exiting
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60000: exiting
2014-04-29 13:38:41,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60000: exiting
2014-04-29 13:38:41,354 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 2 on 60000: exiting
2014-04-29 13:38:41,354 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC Server handler 0 on 60000: exiting
2014-04-29 13:38:41,355 INFO org.apache.hadoop.hbase.master.CatalogJanitor: bigdata-host,60000,1398730591004-CatalogJanitor exiting
2014-04-29 13:38:41,356 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2014-04-29 13:38:41,358 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2014-04-29 13:38:41,358 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2014-04-29 13:38:41,379 INFO org.apache.hadoop.hbase.master.cleaner.LogCleaner: master-bigdata-host,60000,1398730591004.oldLogCleaner exiting
2014-04-29 13:38:41,379 INFO org.apache.hadoop.hbase.master.cleaner.HFileCleaner: master-bigdata-host,60000,1398730591004.archivedHFileCleaner exiting
2014-04-29 13:38:41,379 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer
2014-04-29 13:38:41,419 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010
2014-04-29 13:38:41,487 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@3242e74f
2014-04-29 13:38:41,495 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimerUpdater: bigdata-host,60000,1398730591004.timerUpdater exiting
2014-04-29 13:38:41,500 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2014-04-29 13:38:41,500 INFO org.apache.zookeeper.ZooKeeper: Session: 0x145aad6efba0013 closed
2014-04-29 13:38:41,500 INFO org.apache.hadoop.hbase.master.HMaster: HMaster main thread exiting
2014-04-29 13:38:41,504 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: HMaster Aborted
java.lang.RuntimeException: HMaster Aborted
        at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2129)

 

----------------------------------------------------------------HMaster와 HRegionServer가 살아 있고..

아래의 문구가 반복되면 복구 명령을 날려준다.-----------------------------------------------

 

2014-04-30 14:06:58,098 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 0, slept for 139472 ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms.

위로