Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.

hive json 값 다루기

총관리자 2014.04.17 10:26 조회 수 : 1334

1. json형식의 data파일 생성

hadoop@bigdata-host:~/hadoop/working$ vi simple.json
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}

2. data를 담을 table 생성

create table json_table (json string);

3. data파일을 table에 입력

hive> load data local inpath '/home/hadoop/hadoop/working/simple.json' into table json_table;
Copying data from file:/home/hadoop/hadoop/working/simple.json
Copying file: file:/home/hadoop/hadoop/working/simple.json
Loading data to table default.json_table
Table default.json_table stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 77, raw_data_size: 0]
OK

4. table내용 확인

hive> select * from json_table;
OK
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}

5. json을 컬럼 형태로 query하기(get_json_object이용)

select get_json_object(json_table.json, '$.Foo') as foo,

get_json_object(json_table.json, '$.Bar') as bar,

get_json_object(json_table.json, '$.Quux.QuuxId') as qid,

get_json_object(json_table.json, '$.Quux.QuuxName') as qname

from json_table;

-------------------------->

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0003
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201404170922_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:30:42,028 Stage-1 map = 0%, reduce = 0%
2014-04-17 10:30:48,109 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:49,128 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:50,146 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:51,162 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:52,175 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:53,206 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.87 sec
MapReduce Total cumulative CPU time: 1 seconds 870 msec
Ended Job = job_201404170922_0003
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.87 sec HDFS Read: 295 HDFS Write: 28 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 870 msec
OK
ABC 20090101100000 1234 Sam
Time taken: 19.411 seconds, Fetched: 1 row(s)

6. json을 컬럼 형태로 query하기(json_tuple이용)

select v.foo, v.bar, v.quux, v.qid

from json_table jt

lateral view json_tuple(jt.json, 'Foo', 'Bar', 'Quux', 'Quux.QuuxId') v

as foo, bar, quux, qid;

------------>

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0004, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0004
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201404170922_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:36:41,978 Stage-1 map = 0%, reduce = 0%
2014-04-17 10:36:49,125 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:50,156 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:51,170 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:52,193 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:53,219 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.53 sec
MapReduce Total cumulative CPU time: 1 seconds 530 msec
Ended Job = job_201404170922_0004
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.53 sec HDFS Read: 295 HDFS Write: 55 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 530 msec
OK
foo bar quux qid
ABC 20090101100000 {"QuuxId":1234,"QuuxName":"Sam"} NULL
Time taken: 26.494 seconds, Fetched: 1 row(s)

이 게시물을

이 글의 추천인 목록 목록

번호	제목	날짜	조회 수
23	lateral view 예제	2014.09.18	780
22	banana pi에 hive 0.13.1+mysql(metastore)설치	2014.09.09	2495
21	FAILED: IllegalStateException Variable substitution depth too large: 40 오류발생시 조치사항	2014.08.19	1599
20	2개 data를 join하고 마지막으로 code정보를 join하여 결과를 얻는 mr 프로그램	2014.06.30	468
19	hive job실행시 meta정보를 원격의 mysql에 저장하는 경우 설정방법	2014.05.28	1175
18	hive query에서 mapreduce돌리지 않고 select하는 방법	2014.05.23	900
17	hiverserver2기동시 connection refused가 발생하는 경우 조치방법	2014.05.22	1568
16	hive에서 insert overwrite directory.. 로 하면 default column구분자는 'SOH'혹은 't'가 됨	2014.05.20	1133
15	dual table만들기	2014.05.16	1120
14	insert hbase by hive ... error occured after 5 hours..HMaster가 뜨지 않는 장애에 대한 복구 방법	2014.04.29	7255
13	index생성, 삭제, 활용	2014.04.25	1769
12	unique한 값 생성	2014.04.25	1134
11	sequence한 번호 생성방법	2014.04.25	1261
10	json serde사용법	2014.04.17	1196
»	json 값 다루기	2014.04.17	1334
8	hive에서 생성된 external table에서 hbase의 table에 값 insert하기	2014.04.11	1826
7	external partition table생성및 data확인	2014.04.03	1321
6	의사분산모드에서 presto설치하기	2014.03.31	3126
5	Hive Query Examples from test code (2 of 2)	2014.03.26	11467
4	Hive Query Examples from test code (1 of 2)	2014.03.26	1180

쓰기 태그

첫 페이지 2 3 4 5 6 7 8 끝 페이지

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.

hive json 값 다루기

댓글 0

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.

hive json 값 다루기

댓글 0

LOGIN