Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.
1. hdfs-site.xml과 yarn-site.xml의 설정을 다시 확인한다.
가. hdfs-site.xml
<property>
<name>dfs.hosts.exclude</name>
<value>$HOME/hadoop/etc/hadoop/nodes.exclude</value>
</property>
<property>
<name>dfs.host</name>
<value>$HOME/hadoop/etc/hadoop/nodes.include</value>
</property>
나. yarn-site.xml
<property>
<name>yarn.resourcemanager.nodes.include-path</name>
<value$HOME/hadoop/etc/hadoop/nodes.include</value>
</property>
<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value$HOME/hadoop/etc/hadoop/nodes.exclude</value>
</property>
2. hdfs fsck -storagepolicies 혹은 hdfs fsck -blocks / 를 실행하여 Block의 상태를 확인한다.
결과는 하단 참조
3. 2의 결과가 Status: CORRUPT이면 적절한 조치를 취한다.
hdfs fsck -delete 혹은 hdfs fsck -move
4. 2을 다시 실행하여 Status: HEALTHY인지 확인한다.
결과는 하단 참조
5. 필요시 Decommission과정을 다시 수행한다.
hdfs dfsadmin -refreshNodes
yarn rmadmin -refreshNodes
* Decommission이 수일 혹은 수주 동안 진행될수도 있는데 속도를 증가시키는 방법으로 hdfs-site.xml에 다음을 추가/반영시켜준다.
(참고 : https://community.hortonworks.com/questions/102621/node-decommissioning-progressing-too-slowly.html)
<property>
<name>dfs.namenode.replication.max-streams</name>
<value>50</value>
</property>
<property>
<name>dfs.namenode.replication.max-streams-hard-limit</name>
<value>100</value>
</property>
<property>
<name>dfs.namenode.replication.work.multiplier.per.iteration</name>
<value>200</value>
</property>
-----------hdfs fsck -storagepolicies실행 결과(Status: CORRUPT)----------
.....
(생략)
....
/user/hadoop/spark/local-1510393605261: MISSING 1 blocks of total size 134217728 B......
/user/hadoop/spark/local-1511150952538: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1078958243_5217532. Target Replicas is 3 but found 1 replica(s).
/user/hadoop/spark/local-1511150952538: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1078982811_5242100. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1511756383245: MISSING 1 blocks of total size 8357126 B.....
/user/hadoop/spark/local-1511848071791: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079042189_5301478. Target Replicas is 3 but found 2 replica(s).
..
/user/hadoop/spark/local-1511858124646: MISSING 1 blocks of total size 40291 B..
/user/hadoop/spark/local-1511858518707: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079043514_5302803. Target Replicas is 3 but found 1 replica(s).
....
/user/hadoop/spark/local-1511861829455: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079044160_5303449. Target Replicas is 3 but found 1 replica(s).
..
/user/hadoop/spark/local-1511921506635: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079057453_5316742. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1511931435456: MISSING 1 blocks of total size 702011 B..
/user/hadoop/spark/local-1511932067927: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079058984_5318273. Target Replicas is 3 but found 2 replica(s).
.......
/user/hadoop/spark/local-1511939175974: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079060057_5319346. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1511942070784: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079060488_5319777. Target Replicas is 3 but found 2 replica(s).
.
.
/user/hadoop/spark/local-1511945803722: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079061302_5320591. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1511946633083: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079061444_5320733. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1512003403329: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079074161_5333450. Target Replicas is 3 but found 1 replica(s).
.
/user/hadoop/spark/local-1512008787877: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079074799_5334088. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1512018010728: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079076017_5335306. Target Replicas is 3 but found 2 replica(s).
............
/user/hadoop/spark/local-1512121416466: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079096405_5355695. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512361519396: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079147739_5407029. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512373036884: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079149109_5408399. Target Replicas is 3 but found 2 replica(s).
..
/user/hadoop/spark/local-1512373950155: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079149191_5408481. Target Replicas is 3 but found 1 replica(s).
..........................
/user/hadoop/spark/local-1512641606927: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079196301_5455600. Target Replicas is 3 but found 2 replica(s).
.
/user/hadoop/spark/local-1512694548543: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079208186_5467485. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512712721899: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079272068_5531367. Target Replicas is 3 but found 2 replica(s).
.....
/user/hadoop/spark/local-1512978676213: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079331781_5591080. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1513040318768: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079345530_5604829. Target Replicas is 3 but found 2 replica(s).
...
/user/hadoop/spark/local-1513154800735: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079372104_5631403. Target Replicas is 3 but found 1 replica(s).
..
/user/hadoop/spark/local-1513239737761: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079391263_5650563. Target Replicas is 3 but found 2 replica(s).
/user/hadoop/spark/local-1513239737761: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079416745_5676128. Target Replicas is 3 but found 1 replica(s).
/user/hadoop/spark/local-1513239737761: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079454497_5713964. Target Replicas is 3 but found 1 replica(s).
....
/user/hadoop/spark/local-1513933165296: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079584889_5844405. Target Replicas is 3 but found 1 replica(s).
...
/user/pineone/gooper-test/icbms_2017-07-21_13-28-17.nq.gz: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1076598010_2857261. Target Replicas is 3 but found 2 replica(s).
.....
/user/pineone/in/tomcat-juli.jar: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073741825_1001. Target Replicas is 3 but found 2 replica(s).
.......
/user/pineone/out3/part-r-00000: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073744812_3988. Target Replicas is 3 but found 2 replica(s).
.......
.....Status: CORRUPT
Total size: 1136215827409 B (Total open files size: 939529123 B)
Total dirs: 1415
Total files: 1864205
Total symlinks: 0 (Files currently being written: 12)
Total blocks (validated): 1864295 (avg. block size 609461 B) (Total open file blocks (not validated): 18)
********************************
UNDER MIN REPL'D BLOCKS: 11 (5.900354E-4 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 11
MISSING BLOCKS: 11
MISSING SIZE: 321480184 B
********************************
Minimally replicated blocks: 1864284 (99.99941 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 545406 (29.255348 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.661861
Corrupt blocks: 0
Missing replicas: 630358 (11.270713 %)
Number of data-nodes: 8
Number of racks: 1
FSCK ended at Tue Jan 02 15:42:32 KST 2018 in 196868 milliseconds
FSCK ended at Tue Jan 02 15:42:32 KST 2018 in 196868 milliseconds
fsck encountered internal errors!
Fsck on path '/' FAILED
-----------hdfs fsck -storagepolicies실행 결과(Status: HEALTHY)----------
.....
(생략)
....
/user/hadoop/spark/local-1513239737761: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079416745_5676128. Target Replicas is 3 but found 1 replica(s).
/user/hadoop/spark/local-1513239737761: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079454497_5713964. Target Replicas is 3 but found 1 replica(s).
....
/user/hadoop/spark/local-1513933165296: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079584889_5844405. Target Replicas is 3 but found 1 replica(s).
...
/user/hadoop/spark/local-1514451281961: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1079748353_6021470. Target Replicas is 3 but found 1 replica(s).
.
/user/pineone/gooper-test/icbms_2017-07-21_13-28-17.nq.gz: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1076598010_2857261. Target Replicas is 3 but found 2 replica(s).
.....
/user/pineone/in/tomcat-juli.jar: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073741825_1001. Target Replicas is 3 but found 2 replica(s).
.......
/user/pineone/out3/part-r-00000: Under replicated BP-605282214-166.104.112.43-1498555165989:blk_1073744812_3988. Target Replicas is 3 but found 2 replica(s).
........
....Status: HEALTHY
Total size: 1136067627332 B (Total open files size: 1312 B)
Total dirs: 1446
Total files: 1864304
Total symlinks: 0 (Files currently being written: 4)
Total blocks (validated): 1864376 (avg. block size 609355 B) (Total open file blocks (not validated): 3)
Minimally replicated blocks: 1864376 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 526644 (28.247736 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.682025
Corrupt blocks: 0
Missing replicas: 592825 (10.599168 %)
Number of data-nodes: 8
Number of racks: 1
Blocks satisfying the specified storage policy:
Storage Policy # of blocks % of blocks
DISK:5(HOT) 796652 42.7302%
DISK:3(HOT) 595925 31.9638%
DISK:4(HOT) 471514 25.2907%
DISK:6(HOT) 274 0.0147%
DISK:1(HOT) 10 0.0005%
DISK:2(HOT) 1 0.0001%
All blocks satisfy specified storage policy.
FSCK ended at Tue Jan 02 17:08:00 KST 2018 in 184737 milliseconds
The filesystem under path '/' is HEALTHY