Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.
1. json형식의 data파일 생성
hadoop@bigdata-host:~/hadoop/working$ vi simple.json
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}
2. data를 담을 table 생성
create table json_table (json string);
3. data파일을 table에 입력
hive> load data local inpath '/home/hadoop/hadoop/working/simple.json' into table json_table;
Copying data from file:/home/hadoop/hadoop/working/simple.json
Copying file: file:/home/hadoop/hadoop/working/simple.json
Loading data to table default.json_table
Table default.json_table stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 77, raw_data_size: 0]
OK
4. table내용 확인
hive> select * from json_table;
OK
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}
5. json을 컬럼 형태로 query하기(get_json_object이용)
select get_json_object(json_table.json, '$.Foo') as foo,
get_json_object(json_table.json, '$.Bar') as bar,
get_json_object(json_table.json, '$.Quux.QuuxId') as qid,
get_json_object(json_table.json, '$.Quux.QuuxName') as qname
from json_table;
-------------------------->
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0003
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201404170922_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:30:42,028 Stage-1 map = 0%, reduce = 0%
2014-04-17 10:30:48,109 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:49,128 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:50,146 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:51,162 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:52,175 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:53,206 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.87 sec
MapReduce Total cumulative CPU time: 1 seconds 870 msec
Ended Job = job_201404170922_0003
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.87 sec HDFS Read: 295 HDFS Write: 28 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 870 msec
OK
ABC 20090101100000 1234 Sam
Time taken: 19.411 seconds, Fetched: 1 row(s)
6. json을 컬럼 형태로 query하기(json_tuple이용)
select v.foo, v.bar, v.quux, v.qid
from json_table jt
lateral view json_tuple(jt.json, 'Foo', 'Bar', 'Quux', 'Quux.QuuxId') v
as foo, bar, quux, qid;
------------>
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0004, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0004
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201404170922_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:36:41,978 Stage-1 map = 0%, reduce = 0%
2014-04-17 10:36:49,125 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:50,156 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:51,170 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:52,193 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:53,219 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.53 sec
MapReduce Total cumulative CPU time: 1 seconds 530 msec
Ended Job = job_201404170922_0004
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.53 sec HDFS Read: 295 HDFS Write: 55 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 530 msec
OK
foo bar quux qid
ABC 20090101100000 {"QuuxId":1234,"QuuxName":"Sam"} NULL
Time taken: 26.494 seconds, Fetched: 1 row(s)
댓글 0
번호 | 제목 | 날짜 | 조회 수 |
---|---|---|---|
» | json 값 다루기 | 2014.04.17 | 1425 |
41 | 통계자료 구할수 있는 곳 | 2014.04.16 | 2044 |
40 | column family삭제시 Column family 'delete' does not exist오류 발생하는 경우 | 2014.04.14 | 1058 |
39 | hive에서 생성된 external table에서 hbase의 table에 값 insert하기 | 2014.04.11 | 1865 |
38 | Oozie 설치, 환경설정 및 테스트 | 2014.04.08 | 1737 |
37 | 다수의 로그 에이전트로 부터 로그를 받아 각각의 파일로 저장하는 방법(interceptor및 multiplexing) | 2014.04.04 | 4202 |
36 | external partition table생성및 data확인 | 2014.04.03 | 1590 |
35 | 동일서버에서 LA와 LC동시에 기동하여 테스트 | 2014.04.01 | 1105 |
34 | 의사분산모드에서 presto설치하기 | 2014.03.31 | 3306 |
33 | Hive Query Examples from test code (2 of 2) | 2014.03.26 | 11535 |
32 | Hive Query Examples from test code (1 of 2) | 2014.03.26 | 1389 |
31 | hadoop설치시 오류 | 2013.12.18 | 2731 |
30 | centsOS vsftpd설치하기 | 2013.12.17 | 1935 |
29 | ubuntu에 hadoop 2.0.5설치하기 | 2013.12.16 | 2011 |
28 | centos 5.X에 hadoop 2.0.5 alpha 설치 | 2013.12.16 | 1739 |
27 | hbase에 필요한 jar들 | 2013.04.01 | 2247 |
26 | Hive java connection 설정 | 2013.04.01 | 2310 |
25 | Hbase Shell 명령 정리 | 2013.04.01 | 3466 |
24 | HBASE Client API : 기본 기능 정리 | 2013.04.01 | 3781 |
23 | 하둡 분산 파일 시스템을 기반으로 색인하고 검색하기 | 2013.03.15 | 5763 |