메뉴 건너뛰기

Cloudera, BigData, Semantic IoT, Hadoop, NoSQL

Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.


hive json 값 다루기

총관리자 2014.04.17 10:26 조회 수 : 4217

1. json형식의 data파일 생성

hadoop@bigdata-host:~/hadoop/working$ vi simple.json
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}

2. data를 담을 table 생성

create table json_table (json string);

 

3. data파일을 table에 입력

hive> load data local inpath '/home/hadoop/hadoop/working/simple.json' into table json_table;
Copying data from file:/home/hadoop/hadoop/working/simple.json
Copying file: file:/home/hadoop/hadoop/working/simple.json
Loading data to table default.json_table
Table default.json_table stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 77, raw_data_size: 0]
OK

4. table내용 확인

hive> select * from json_table;
OK
{"Foo":"ABC","Bar":"20090101100000","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}

 

5. json을 컬럼 형태로 query하기(get_json_object이용)

select get_json_object(json_table.json, '$.Foo') as foo,

          get_json_object(json_table.json, '$.Bar') as bar,

          get_json_object(json_table.json, '$.Quux.QuuxId') as qid,

          get_json_object(json_table.json, '$.Quux.QuuxName') as qname

from json_table;

-------------------------->

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0003
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201404170922_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:30:42,028 Stage-1 map = 0%,  reduce = 0%
2014-04-17 10:30:48,109 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:49,128 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:50,146 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:51,162 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:52,175 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.87 sec
2014-04-17 10:30:53,206 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.87 sec
MapReduce Total cumulative CPU time: 1 seconds 870 msec
Ended Job = job_201404170922_0003
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.87 sec   HDFS Read: 295 HDFS Write: 28 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 870 msec
OK
ABC 20090101100000 1234 Sam
Time taken: 19.411 seconds, Fetched: 1 row(s)

6. json을 컬럼 형태로 query하기(json_tuple이용)

select v.foo, v.bar, v.quux, v.qid

from json_table jt

 lateral view json_tuple(jt.json, 'Foo', 'Bar', 'Quux', 'Quux.QuuxId') v

    as foo, bar, quux, qid;

------------>

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201404170922_0004, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201404170922_0004
Kill Command = /home/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job  -kill job_201404170922_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-04-17 10:36:41,978 Stage-1 map = 0%,  reduce = 0%
2014-04-17 10:36:49,125 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:50,156 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:51,170 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:52,193 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.53 sec
2014-04-17 10:36:53,219 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.53 sec
MapReduce Total cumulative CPU time: 1 seconds 530 msec
Ended Job = job_201404170922_0004
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.53 sec   HDFS Read: 295 HDFS Write: 55 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 530 msec
OK
foo bar quux qid
ABC 20090101100000 {"QuuxId":1234,"QuuxName":"Sam"} NULL
Time taken: 26.494 seconds, Fetched: 1 row(s)

 

번호 제목 날짜 조회 수
650 banana pi에 hive 0.13.1+mysql(metastore)설치 file 2014.09.09 4856
649 json으로 존재하는 데이터 parsing하기 2019.03.25 4848
648 spark에서 hive table을 읽어 출력하는 예제 소스 2017.03.09 4846
647 hadoop설치시 참고사항 2013.03.08 4846
646 [Impala 3.2버젼]compute incremental stats db명.테이블명 수행시 ERROR: AnalysisException: Incremental stats size estimate exceeds 2000.00MB. 오류 발생원인및 조치방안 2022.11.30 4842
645 Impala Admission Control 설정시 쿼리가 사용하는 메모리 사용량 판단 방법 2023.05.19 4840
644 hue db에서 사용자가 가지는 정보 확인 2020.02.10 4840
643 임시 테이블에서 데이터를 읽어서 partitioned table에 입력하는 impala SQL문 예시 2023.11.10 4830
642 [ftgo_application]Unable to infer base url오류 발생시 조치방법 2023.02.20 4825
641 hadoop 클러스터 실행 스크립트 정리 2018.03.20 4814
640 [CDP7.1.7]Oozie job에서 ERROR: Kudu error(s) reported, first error: Timed out: Failed to write batch of 774 ops to tablet 8003f9a064bf4be5890a178439b2ba91가 발생하면서 쿼리가 실패하는 경우 2024.01.05 4813
639 mysql에서 외부 디비를 커넥션할 경우 접속 속도가 느려질때 2017.06.30 4811
638 impald에서 idle_query_timeout 와 idle_session_timeout 구분 2021.05.20 4802
637 hbase에 필요한 jar들 2013.04.01 4800
636 Hadoop 설치 및 시작하기 file 2013.03.06 4776
635 [Hive canary]Hive에 Metastore canary red alert및 hive log파일에 Duplicate entry '123456' for key 'NOTIFICATION_LOG_EVENT_ID'가 발생시 조치사항 2023.03.10 4773
634 [application수행 로그]Failed to read the application application_123456789012_123456시 조치 방법 2022.03.21 4766
633 beeline실행시 User: root is not allowed to impersonate오류 발생시 조치사항 2016.06.03 4749
632 Hue impala에서 query결과를 HDFS 파일로 export시 AuthorizationException: User 'gooper1234' does not have privileges to access: db명.query_impala_123456 2022.03.17 4735
631 9대가 hbase cluster로 구성된 서버에서 테스트 data를 halyard에 적재하고 테스트 하는 방법및 절차 2017.07.21 4733
위로