Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.
1. hive-json-serde-0.2.jar다운로드 한다.
-> https://code.google.com/p/hive-json-serde/downloads/list
2. hive의 lib 디렉토리로 옮긴다.
cp hive-json-serde-0.2.jar.jar /home/hadoop/hive/lib/hive-json-serde-0.2.jar
3. data준비(Test.json)
{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}
4. hive shell에서 jar파일등록
hive> add jar /home/hadoop/hive/lib/hive-json-serde-0.2.jar;
Added /home/hadoop/hive/lib/hive-json-serde-0.2.jar to class path
Added resource: /home/hadoop/hive/lib/hive-json-serde-0.2.jar
5. json serde를 사용하는 table생성
hive> create table my_table(field1 string, field2 int, field3 string, field4 double)
> row format serde 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
6. data load
hive> load data local inpath '/home/hadoop/hadoop/working/Test.json' into table my_table;
Copying data from file:/home/hadoop/hadoop/working/Test.json
Copying file: file:/home/hadoop/hadoop/working/Test.json
Loading data to table default.my_table
Table default.my_table stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 287, raw_data_size: 0]
7. my_table조회
hive> select * from my_table;
OK
field1 field2 field3 field4
data1 100 more data1 123.001
data2 200 more data2 123.002
data3 300 more data3 123.003
data4 400 more data4 123.004
Time taken: 0.188 seconds, Fetched: 4 row(s)