Cloudera CDH/CDP 및 Hadoop EcoSystem, Semantic IoT등의 개발/운영 기술을 정리합니다. gooper@gooper.com로 문의 주세요.
0. root권한으로 설치하고 실제 사용은 hadoop게정이 하는 경우임.
1. hive 다운로드및 설치할 위치에 업로드
http://mirror.apache-kr.org/hive/
* apache-hive-2.0.1-bin.tar.gz를 선택하여 다운로드하여 winscp등를 이용하여 업로드한다.
2. 압축풀기
- tar xvfz apache-hive-2.0.1-bin.tar.gz
3. 링크 걸기
- ln -s apache-hive-2.0.1-bin hive
4. /etc/profile에 아래의 내용을 추가함(hive가 설치되는 모든 서버에서 실행함)
export HIVE_HOME=$HOME/hive
export PATH=$PATH:$HIVE_HOME/bin
* source /etc/profile로 반영함
* hadoop이 설치되어 있고 HADOOP_HOME의 환경변수값이 설정되어 있지 않으면
export HADOOP_HOME=<hadoop-install-dir>를 반드시 설정해줘야 함.
5. conf파일 생성
conf/*.template파일을 복사하여 각각의 설정파일을 만든다.
cp hive-env.sh.template hive-env.sh
cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
cp hive-log4j2.properties.template hive-log4j2.properties
cp hive-default.xml.template hive-site.xml (파일의 이름이 hive-default.xml.template에서hive-site.xml로 변경됨)
cp beeline-log4j2.properties.template beeline-log4j2.properties
6. hive에서 사용할 HDFS에 디렉토리 구성및 권한부여
hadoop fs -mkdir /tmp
hadoop fs -mkdir /user/hive
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
hadoop fs -chown hadoop:hadoop /user/hive
hadoop fs -chown hadoop:hadoop /user/hive/warehouse
7. mysql 혹은 mariadb설치(필요시, 최초한번)
root@master:/usr/local# apt-get install mysql-server
(404오류가발생시 https://www.gooper.com/ss/index.php?mid=bigdata&category=2772&document_srl=2995를 참조하여 조치후 설치한다)
8. db생성(mysql의 db생성등의 권한이 있는 계정으로 실행)
mysql> create database hive_metastore_db;
Query OK, 1 row affected (0.00 sec)
9. 사용자 생성및 원격 접속 가능하도록 설정
가. mysql> grant all privileges on *.* to 'hive'@'localhost' identified by '패스워드' with grant option;
Query OK, 0 rows affected (0.00 sec)
나. mysql>grant all privileges on *.* to 'hive'@'%' identified by '패스워드' with grant option;
다. grant all privileges on hive_metastore_db.* to 'hive'@'%' identified by '패스워드';
* 확인
mysql> select host from mysql.user where user='hive';
+-----------+
| host |
+-----------+
| % |
| localhost |
+-----------+
2 rows in set (0.00 sec)
10. 테이블 생성 스크립트 원격 실행($HOmE/hive/scripts/metastore/upgrade밑에 있는 스크립트를 이용한다)
$HOME/hive/scripts/metastore/upgrade/mysql>
mysql --host=so-db1 --port=3306 -u root -p패스워드 hive_metastore_db < hive-schema-2.0.0.mysql.sql
다른 방법으로 아래와 같이 사용할 수 있음(13번 항목이 설정된 다음에 실행가능함)
* ./bin/schematool -initSchema -dbType mysql -verbose 을 사용하여 metastore의 초기화를 수행할 수 있음
* ./bin/schematool -info -dbType mysql을 사용하면 설정된 정보를 확인할 수 있음
11. mysql connector를 다운로드하여 $HIVE_HOME/lib에 복사
(http://www.mysql.com/downloads/connector/j/에서 다운로드하여 압축해제후 mariadb-java-client-1.5.9.jar를 $HIVE_HOME/lib에 복사)
12. $HIVE_HOME/conf/hive-env.sh에서 hadoop위치를 지정해준다.
# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
HADOOP_HOME=$HOME/hadoop
13. $HIVE_HOME/conf의 hive-site.xml에 아래를 추가 혹은 변경한다.
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive_metastore_db?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<!-- mysql의 경우(예, mysql-connector-java-5.1.39-bin.jar) -->
<!-- value>com.mysql.jdbc.Driver</value -->
<!-- mariadb의 경우(예, mariadb-java-client-1.5.4.jar) -->
<value>org.mariadb.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>패스워드</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.port</name>
<value>33306</value>
<description>Hive metastore listener port</description>
</property>
</configuration>
*hive-site.xml의 아래항목의 값을 아래와 같이 변경해준다(default로 설정된 값으로 쓰면 오류가 발생한다)
<name>hive.exec.scratchdir</name>
<!-- value>/tmp/hive</value -->
<value>/tmp/hive-${user.name}</value>
<name>hive.exec.local.scratchdir</name>
<!-- value>${system:java.io.tmpdir}/${system:user.name}</value -->
<value>/tmp/${user.name}</value>
<name>hive.downloaded.resources.dir</name>
<!-- value>${system:java.io.tmpdir}/${hive.session.id}_resources</value -->
<value>/tmp/${user.name}_resources</value>
<name>hive.scratch.dir.permission</name>
<!-- value>700</value -->
<value>733</value>
<name>hive.querylog.location</name>
<!-- value>${system:java.io.tmpdir}/${system:user.name}</value -->
<value>/tmp/${user.name}</value>
13-1. Table Lock Manager활성화를 위한 설정(설정하지 않으면 동일한 테이블에 동시에 쿼리가 수행될때 데이타가 오염될 수 있음)
<property>
<name>hive.support.concurrency</name>
<value>true</value>
<description>
Whether Hive supports concurrency control or not.
A ZooKeeper instance must be up and running when using zookeeper Hive lock manager
</description>
<property>
<name>hive.zookeeper.quorum</name>
<value>gsda1:2181,gsda2:2181,gsda3:2181</value>
<description>
List of ZooKeeper servers to talk to. This is needed for:
1. Read/write locks - when hive.lock.manager is set to
org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,
2. When HiveServer2 supports service discovery via Zookeeper.
3. For delegation token storage if zookeeper store is used, if
hive.cluster.delegation.token.store.zookeeper.connectString is not set
4. LLAP daemon registry service
</description>
</property>
14. hadoop-env.sh 끝에 아래의 내용을 추가하여 hive관련 lib및 conf를 hadoop이 인식할 수 있도록 한다.
가. export HADOOP_CLASSPATH에 $HIVE_HOME/conf:$HIVE_HOME/lib를 추가함
나. 파일끝에 아래를 추가하여 hadoop이 hive를 인식할 수 있도록함
for f in ${HIVE_HOME}/lib/*.jar; do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f;
done
15. hive metastore실행
: hive metastore서버 시작(hive가 설치된 master에서 실행)
nohup bin/hive --service metastore &
*오류시 참조 : https://www.gooper.com/ss/index.php?mid=bigdata&category=2777&document_srl=3393
*확인 : jps m
118506 RunJar /opt/bin/hadoop/hive/lib/hive-service-2.0.1.jar org.apache.hadoop.hive.metastore.HiveMetaStore
16. hiveserver2실행(JDBC로 접근할 수 있도록 서버를 띄워준다)
: hive server시작(hive가 설치된 master에서 실행)
nohup bin/hiveserver2 &
*확인1 : jps m
102252 RunJar /opt/bin/hadoop/hive/lib/hive-service-2.0.1.jar org.apache.hive.service.server.HiveServer2
*확인2 : hiveserver2를 실행한 곳에 derby.log등이 생성되어 있으면 원격 metastore에 접속하지 못한것임
*확인3 : http://sda1:10002/hiveserver2.jsp
17. beeline실행(JDBC를 이용하여 hiveserver2에 접근하는 client프로그램)
hiveserver2에 접근하기 위해서 "bin/beeline -u jdbc:hive2://sda1:10000"을 실행한다.
(jdbc:hive2://sda1:10000가 JDBC URL이다.)
===>
-bash-4.1# beeline -u jdbc:hive2://sda1:10000
which: no hbase in (/opt/jdk1.8.0_66/bin:/usr/local/rvm/gems/ruby-2.2.3/bin:/usr/local/rvm/gems/ruby-2.2.3@global/bin:/usr/local/rvm/rubies/ruby-2.2.3/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/rvm/bin:/svc/apps/sda/bin/hadoop/hadoop/bin:/svc/apps/sda/bin/hadoop/elasticsearch/bin:/opt/apache-maven-3.3.9/bin:/svc/apps/sda/bin/hadoop/hive/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/pineone/svc/apps/sda/bin/hadoop/apache-hive-2.0.1-bin/lib/hive-jdbc-2.0.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/pineone/svc/apps/sda/bin/hadoop/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/pineone/svc/apps/sda/bin/hadoop/spark-1.3.1-bin-hadoop2.6/lib/spark-assembly-1.3.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/pineone/svc/apps/sda/bin/hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://sda1:10000
Connected to: Apache Hive (version 2.0.1)
Driver: Hive JDBC (version 2.0.1)
16/06/03 20:57:22 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.0.1 by Apache Hive
=====>
0: jdbc:hive2://sda1:10000> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
+----------------+--+
1 row selected (4.431 seconds)
=====>
0: jdbc:hive2://sda1:10000> create table test (a varchar(100));
=====>
0: jdbc:hive2://sda1:10000> show tables;
OK
INFO : Compiling command(queryId=root_20160603205749_4df86e6a-c99b-4c82-bc95-bf240a4aaa70): show tables
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=root_20160603205749_4df86e6a-c99b-4c82-bc95-bf240a4aaa70); Time taken: 0.018 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=root_20160603205749_4df86e6a-c99b-4c82-bc95-bf240a4aaa70): show tables
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=root_20160603205749_4df86e6a-c99b-4c82-bc95-bf240a4aaa70); Time taken: 0.025 seconds
INFO : OK
+-----------+--+
| tab_name |
+-----------+--+
| test |
+-----------+--+
1 row selected (0.314 seconds)
0: jdbc:hive2://sda1:10000> exit;
* beeline에서 벗어나기 : !quit
* hiveserver2에 beeline접속하여 사용하는 설명 : https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
* 아래와 같은 오류가 발생할 수 있는데 이때는 "cp $HOME/apache-hive-2.0.1-bin/jdbc/hive-jdbc-2.0.1-standalone.jar $HOME/apache-hive-2.0.1-bin/lib"를 싱행하여 해당 jar파일을 복사해준다.
-------------------------------오류내용---------------------------
-bash-4.1# ./beeline
which: no hbase in (/opt/jdk1.8.0_66/bin:/usr/local/rvm/gems/ruby-2.2.3/bin:/usr/local/rvm/gems/ruby-2.2.3@global/bin:/usr/local/rvm/rubies/ruby-2.2.3/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/rvm/bin:/svc/apps/sda/bin/hadoop/hadoop/bin:/svc/apps/sda/bin/hadoop/elasticsearch/bin:/opt/apache-maven-3.3.9/bin:/svc/apps/sda/bin/hadoop/hive/bin)
ls: cannot access /svc/apps/sda/bin/hadoop/hive/lib/hive-jdbc-*-standalone.jar: 그런 파일이나 디렉터리가 없습니다
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/pineone/svc/apps/sda/bin/hadoop/apache-hive-2.0.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/pineone/svc/apps/sda/bin/hadoop/spark-1.3.1-bin-hadoop2.6/lib/spark-assembly-1.3.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/pineone/svc/apps/sda/bin/hadoop/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.0.1 by Apache Hive
------------------------------------------------------------------------------
*로그파일 위치(root로 실행할때 default위치) : /tmp/root/hive.log
* 참조 : http://blrunner.com/100
18. beeline을 사용하지 않고 hive를 사용하는 방법
$bin/hive
* schema설정없이 hive를 최초에 실행했을때 "MetaException(message:Hive metastore database is not initialized"오류가 발생하는데 이때는 https://www.gooper.com/ss/index.php?mid=bigdata&category=2777&document_srl=3222 를 참조하여 조치해준다.