Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.
1. 데이터 준비(vi json.dat)
{"country":"US","page":227,"data":{"impressions":{"s":10,"o":10}}}}
{"country":"US2","page":228,"data":{"impressions":{"s":11,"o":13}}}}
2. HDFS에 로딩
hdfs dfs -put ./json.dat /tmp/json.dat
3. table 생성
create table hive_parsing_json_table(json string);
4. data를 테이블에 입력
load data inpath '/tmp/json.dat' into table hive_parsing_json_table;
5. select v1.Country, v2.Page, v4.impressions_s, v4.impressions_o
from hive_parsing_json_table hpjp
lateral view json_tuple(hpjp.json, 'country', 'page', 'data') v1 as Country, Page, data
lateral view json_tuple(v1.data, 'ad') v2 ad Ad
lateral view json_tuple(v2.Ad, 'impressions') v3 as Impressions
lateral view json_tuple(v3.Impressions, 's', 'o') v4 as impressions_s, impressions_o;
6. 결과
v1.country v1.page v4.impressions_s v4.impressions_o
US 227 10 10
US2 228 11 13