Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.
Hive query for adding jar or script files
set jar=${system:build.ivy.lib.dir}/default/derby-${system:derby.version}.jar; add file ${hiveconf:jar}; -- 추가 list file; -- 리스트 delete file ${hiveconf:jar}; -- 삭제
Hive example for creating table using RegexSerDe
CREATE TABLE serde_regex( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\[[^\]]*\]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?: ([^ "]*|"[^"]*") ([^ "]*|"[^"]*"))?", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s" ) STORED AS TEXTFILE;
아파치 로그 파일
../data/files/apache.access.log 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
아파치 로그 파일2
../data/files/apache.access.2.log 127.0.0.1 - - [26/May/2009:00:00:00 +0000] "GET /someurl/?track=Blabla(Main) HTTP/1.1" 200 5864 - "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.65 Safari/525.19"
Hive query which contains “transform” using specific script.
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; CREATE TABLE dest1(key INT, value STRING); ADD FILE src/test/scripts/testgrep; FROM ( FROM src SELECT TRANSFORM(src.key, src.value) USING 'testgrep' AS (tkey, tvalue) CLUSTER BY tkey ) tmap INSERT OVERWRITE TABLE dest1 SELECT tmap.tkey, tmap.tvalue; SELECT dest1.* FROM dest1;
src/test/scripts/testgrep
#!/bin/bash egrep '10.*' exit 0;
Hive tablesamples example
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
-- TABLESAMPLES CREATE TABLE bucketized_src (key INT, value STRING) CLUSTERED BY (key) SORTED BY (key) INTO 1 BUCKETS; INSERT OVERWRITE TABLE bucketized_src SELECT key, value FROM src WHERE key=66; SELECT key FROM bucketized_src TABLESAMPLE(BUCKET 1 out of 1);
Hive query for creating table using specific delimiters
create table impressions (imp string, msg string) row format delimited fields terminated by 't' lines terminated by 'n' stored as textfile;
기본 파티션 이름 설정
create table default_partition_name (key int, value string) partitioned by (ds string); set hive.exec.default.partition.name='some_other_default_partition_name'; alter table default_partition_name add partition(ds='__HIVE_DEFAULT_PARTITION__'); show partitions default_partition_name;
문자열 상수 처리
SELECT 'face''book', 'face' 'book', 'face' 'book', "face""book", "face" "book", "face" "book", 'face' 'bo' 'ok', 'face'"book", "face"'book', 'facebook' FROM src LIMIT 1;
결과
facebook facebook facebook facebook facebook facebook facebook facebook facebook facebook
Hive table lock examples
CREATE TABLE tstsrc (col1 STRING) STORED AS TEXTFILE; SHOW LOCKS; SHOW LOCKS tstsrc; SHOW LOCKS tstsrc extended; LOCK TABLE tstsrc shared; UNLOCK TABLE tstsrc;
Hive partition lock examples
LOCK TABLE tstsrcpart PARTITION(ds='2008-04-08', hr='11') EXCLUSIVE; SHOW LOCKS tstsrcpart PARTITION(ds='2008-04-08', hr='11') extended; UNLOCK TABLE tstsrcpart PARTITION(ds='2008-04-08', hr='11');
Hive virtual column
0.8.0 부터 INPUT_FILENAME, BLOCKOFFSETINSIDE_FILE 두 개 가상 컬럼 지원함
- INPUT_FILE_NAME는 맵퍼 테스크의 파일 이름
- BLOCK_OFFSETINSIDE_FILE는 현재 글로벌 파일 포지션
블락이 압축된 파일인 경운 현재 블락의 파일 오프셋은 현재 블락의 첫번째 바이트의 파일 오프셋이다.
select INPUT__FILE__NAME, key, BLOCK__OFFSET__INSIDE__FILE from src;
로컬 디렉토리에 결과 쓰기
FROM src INSERT OVERWRITE DIRECTORY '../build/contrib/hive/ql/test/data/warehouse/dest4.out' SELECT src.value WHERE src.key >= 300 dfs -cat ../build/contrib/hive/ql/test/data/warehouse/dest4.out/*;
Hive example for comparison of timestamp values
select cast('2011-05-06 07:08:09' as timestamp) > cast('2011-05-06 07:08:09' as timestamp) from src limit 1;
Hive type casting
SELECT IF(false, 1, cast(2 as smallint)) + 3 FROM src LIMIT 1;
Show table properties in Hive
show tblproperties tmpfoo; show tblproperties tmpfoo("bar");
Display functions in Hive CLI
SHOW FUNCTIONS; SHOW FUNCTIONS '^c.*'; SHOW FUNCTIONS '.*e$'; SHOW FUNCTIONS 'log.*'; SHOW FUNCTIONS '.*date.*'; SHOW FUNCTIONS '***';
Show colums in Hive
CREATE TABLE shcol_test(KEY STRING, VALUE STRING) PARTITIONED BY(ds STRING) STORED AS TEXTFILE; SHOW COLUMNS from shcol_test;
Reset hive settings
set hive.skewjoin.key; set hive.skewjoin.mapjoin.min.split; set hive.skewjoin.key=300000; set hive.skewjoin.mapjoin.min.split=256000000; set hive.skewjoin.key; set hive.skewjoin.mapjoin.min.split; reset; set hive.skewjoin.key; set hive.skewjoin.mapjoin.min.split;
Print column header in Hive CLI
set hive.cli.print.header=true;
프로그래스 heartbeat 간격
set hive.heartbeat.interval=5;
DDL 관련 출력 포맷을 json으로 변경
set hive.ddl.output.format=json; desc extended table_name; set hive.ddl.output.format=text; -- 기본값