메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


1. hdfs-site.xml과 yarn-site.xml의 설정을 다시 확인한다.

 가. hdfs-site.xml

   <property>
     <name>dfs.hosts.exclude</name>
     <value>$HOME/hadoop/etc/hadoop/nodes.exclude</value>
   </property>
   <property>
     <name>dfs.host</name>
     <value>$HOME/hadoop/etc/hadoop/nodes.include</value>
   </property>


 나. yarn-site.xml

  <property>
   <name>yarn.resourcemanager.nodes.include-path</name>
   <value$HOME/hadoop/etc/hadoop/nodes.include</value>
  </property>
  <property>
   <name>yarn.resourcemanager.nodes.exclude-path</name>
   <value$HOME/hadoop/etc/hadoop/nodes.exclude</value>
  </property>


2. hdfs fsck -storagepolicies 혹은 hdfs fsck -blocks / 를 실행하여 Block의 상태를 확인한다.

 결과는 하단 참조


3. 2의 결과가 Status: CORRUPT이면 적절한 조치를 취한다.

 hdfs fsck -delete 혹은 hdfs fsck -move


4. 2을 다시 실행하여 Status: HEALTHY인지 확인한다.

  결과는 하단 참조


5. 필요시 Decommission과정을 다시 수행한다.

hdfs dfsadmin -refreshNodes

yarn rmadmin -refreshNodes


* Decommission이 수일 혹은 수주 동안 진행될수도 있는데 속도를 증가시키는 방법으로 hdfs-site.xml에 다음을 추가/반영시켜준다.

   (참고 : https://community.hortonworks.com/questions/102621/node-decommissioning-progressing-too-slowly.html)

   <property>
     <name>dfs.namenode.replication.max-streams</name>
     <value>50</value>
   </property>
  
   <property>
     <name>dfs.namenode.replication.max-streams-hard-limit</name>
     <value>100</value>
   </property>
  
   <property>
     <name>dfs.namenode.replication.work.multiplier.per.iteration</name>
     <value>200</value>
   </property>



-----------hdfs fsck -storagepolicies실행 결과(Status: CORRUPT)----------

.....

(생략)

....

/user/hadoop/spark/local-1510393605261: MISSING 1 blocks of total size 134217728 B......
/user/hadoop/spark/local-1511150952538:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1078958243_5217532. Target Replicas is 3 but found 1 replica(s).

/user/hadoop/spark/local-1511150952538:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1078982811_5242100. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1511756383245: MISSING 1 blocks of total size 8357126 B.....
/user/hadoop/spark/local-1511848071791:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079042189_5301478. Target Replicas is 3 but found 2 replica(s).
..
/user/hadoop/spark/local-1511858124646: MISSING 1 blocks of total size 40291 B..
/user/hadoop/spark/local-1511858518707:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079043514_5302803. Target Replicas is 3 but found 1 replica(s).
....
/user/hadoop/spark/local-1511861829455:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079044160_5303449. Target Replicas is 3 but found 1 replica(s).
..
/user/hadoop/spark/local-1511921506635:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079057453_5316742. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1511931435456: MISSING 1 blocks of total size 702011 B..
/user/hadoop/spark/local-1511932067927:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079058984_5318273. Target Replicas is 3 but found 2 replica(s).
.......
/user/hadoop/spark/local-1511939175974:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079060057_5319346. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1511942070784:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079060488_5319777. Target Replicas is 3 but found 2 replica(s).
.
.
/user/hadoop/spark/local-1511945803722:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079061302_5320591. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1511946633083:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079061444_5320733. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1512003403329:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079074161_5333450. Target Replicas is 3 but found 1 replica(s).
.
/user/hadoop/spark/local-1512008787877:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079074799_5334088. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1512018010728:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079076017_5335306. Target Replicas is 3 but found 2 replica(s).
............
/user/hadoop/spark/local-1512121416466:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079096405_5355695. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512361519396:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079147739_5407029. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512373036884:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079149109_5408399. Target Replicas is 3 but found 2 replica(s).
..
/user/hadoop/spark/local-1512373950155:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079149191_5408481. Target Replicas is 3 but found 1 replica(s).
..........................
/user/hadoop/spark/local-1512641606927:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079196301_5455600. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1512694548543:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079208186_5467485. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512712721899:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079272068_5531367. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512978676213:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079331781_5591080. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1513040318768:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079345530_5604829. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1513154800735:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079372104_5631403. Target Replicas is 3 but found 1 replica(s).
..
/user/hadoop/spark/local-1513239737761:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079391263_5650563. Target Replicas is 3 but found 2 replica(s).

/user/hadoop/spark/local-1513239737761:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079416745_5676128. Target Replicas is 3 but found 1 replica(s).

/user/hadoop/spark/local-1513239737761:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079454497_5713964. Target Replicas is 3 but found 1 replica(s).
....
/user/hadoop/spark/local-1513933165296:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079584889_5844405. Target Replicas is 3 but found 1 replica(s).
...
/user/pineone/gooper-test/icbms_2017-07-21_13-28-17.nq.gz:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1076598010_2857261. Target Replicas is 3 but found 2 replica(s).
.....
/user/pineone/in/tomcat-juli.jar:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073741825_1001. Target Replicas is 3 but found 2 replica(s).
.......
/user/pineone/out3/part-r-00000:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073744812_3988. Target Replicas is 3 but found 2 replica(s).
.......
.....Status: CORRUPT
 Total size:    1136215827409 B (Total open files size: 939529123 B)
 Total dirs:    1415
 Total files:   1864205
 Total symlinks:                0 (Files currently being written: 12)
 Total blocks (validated):      1864295 (avg. block size 609461 B) (Total open file blocks (not validated): 18)
  ********************************
  UNDER MIN REPL'D BLOCKS:      11 (5.900354E-4 %)
  dfs.namenode.replication.min: 1
  CORRUPT FILES:        11
  MISSING BLOCKS:       11
  MISSING SIZE:         321480184 B
  ********************************
 Minimally replicated blocks:   1864284 (99.99941 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       545406 (29.255348 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.661861
 Corrupt blocks:                0
 Missing replicas:              630358 (11.270713 %)
 Number of data-nodes:          8
 Number of racks:               1
FSCK ended at Tue Jan 02 15:42:32 KST 2018 in 196868 milliseconds
FSCK ended at Tue Jan 02 15:42:32 KST 2018 in 196868 milliseconds
fsck encountered internal errors!


Fsck on path '/' FAILED



-----------hdfs fsck -storagepolicies실행 결과(Status: HEALTHY)----------

.....

(생략)

....

/user/hadoop/spark/local-1513239737761:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079416745_5676128. Target Replicas is 3 but found 1 replica(s).

/user/hadoop/spark/local-1513239737761:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079454497_5713964. Target Replicas is 3 but found 1 replica(s).
....
/user/hadoop/spark/local-1513933165296:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079584889_5844405. Target Replicas is 3 but found 1 replica(s).
...
/user/hadoop/spark/local-1514451281961:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079748353_6021470. Target Replicas is 3 but found 1 replica(s).
.
/user/pineone/gooper-test/icbms_2017-07-21_13-28-17.nq.gz:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1076598010_2857261. Target Replicas is 3 but found 2 replica(s).
.....
/user/pineone/in/tomcat-juli.jar:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073741825_1001. Target Replicas is 3 but found 2 replica(s).
.......
/user/pineone/out3/part-r-00000:  Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073744812_3988. Target Replicas is 3 but found 2 replica(s).
........
....Status: HEALTHY
 Total size:    1136067627332 B (Total open files size: 1312 B)
 Total dirs:    1446
 Total files:   1864304
 Total symlinks:                0 (Files currently being written: 4)
 Total blocks (validated):      1864376 (avg. block size 609355 B) (Total open file blocks (not validated): 3)
 Minimally replicated blocks:   1864376 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       526644 (28.247736 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.682025
 Corrupt blocks:                0
 Missing replicas:              592825 (10.599168 %)
 Number of data-nodes:          8
 Number of racks:               1

Blocks satisfying the specified storage policy:
Storage Policy                  # of blocks       % of blocks
DISK:5(HOT)                   796652              42.7302%
DISK:3(HOT)                   595925              31.9638%
DISK:4(HOT)                   471514              25.2907%
DISK:6(HOT)                      274               0.0147%
DISK:1(HOT)                       10               0.0005%
DISK:2(HOT)                        1               0.0001%

All blocks satisfy specified storage policy.
FSCK ended at Tue Jan 02 17:08:00 KST 2018 in 184737 milliseconds


The filesystem under path '/' is HEALTHY

번호 제목 날짜 조회 수
130 [Hadoop Encryption] Encryption Zone에 생성된 table에 Hue에서 insert 수행시 User:hdfs not allowed to do 'DECRYPT_EEK' ON 'testkey' 오류 2023.11.01 116
129 kudu의 내부 table명 변경하는 방법 2022.11.10 577
128 hadoop nfs gateway설정 (Cloudera 6.3.4, CentOS 7.4 환경에서) 2022.01.07 386
127 tablet별 disk사용량 확인하는 방법 2021.08.27 418
126 drop table로 삭제했으나 tablet server에는 여전히 존재하는 테이블 삭제방법 2021.07.09 7937
125 [Kudu] tablet server 혹은 kudu master가 어떤 원인에 의해서 replica가 failed상태인 경우 복구하는 방법 2021.05.24 472
124 missing block및 관련 파일명 찾는 명령어 2021.02.20 321
123 W/F수행후 Logs not available for 1. Aggregation may not to complete. 표시되며 로그내용이 보이지 않은 경우 2020.05.08 2215
122 A Cluster의 HDFS 디렉토리및 파일을 사용자및 권한 유지 하여 다운 받아서 B Cluster에 넣기 2020.05.06 587
121 기준일자 이전의 hdfs 데이타를 지우는 shellscript 샘플 2019.06.14 556
120 Error: java.lang.RuntimeException: java.lang.OutOfMemoryError 오류가 발생하는 경우 2018.09.20 632
119 physical memory used되면서 mapper가 kill되는 경우 오류 발생시 조치 2018.09.20 1755
118 postgresql-9.4에서 FATAL: remaining connection slots are reserved for non-replication superuser connections가 나올때 조치 2018.08.16 1085
117 [postgresql 9.x] PostgreSQL Replication 구축하기 2018.07.17 321
116 resouce manager에 dr.who가 아닌 다른 사용자로 로그인 하기 2018.06.28 1487
115 hadoop 클러스터 실행 스크립트 정리 2018.03.20 697
114 HA(Namenode, ResourceManager, Kerberos) 및 보안(Zookeeper, Hadoop) 2018.03.16 186
113 update 샘플 2018.03.12 1002
112 Hadoop의 Datanode를 Decommission하고 나서 HBase의 regionservers파일에 해당 노드명을 지웠는데 여전히 "Dead regionser"로 표시되는 경우 처리 2018.01.25 450
» [Decommission]시 시간이 많이 걸리면서(수일) Decommission이 완료되지 않는 경우 조치 2018.01.03 6046
위로